Get number of rows and number of columns of dataframe in pyspark

Get Size and Shape of the dataframe: In order to get the number of rows and number of column in pyspark we will be using functions like count() function and length() function. Dimension of the dataframe in pyspark is calculated by extracting the number of rows and number columns of the dataframe. We will also get the count of distinct rows in pyspark .Let’s see how to

  • Get size and shape of the dataframe in pyspark
  • Count the number of rows in pyspark with an example using count()
  • Count the number of distinct rows in pyspark with an example
  • Count the number of columns in pyspark with an example

 

We will be using dataframe named df_student

Get number of rows and number of columns of dataframe in pyspark 1

 

Get Size and Shape of the dataframe in pyspark:

Get number of rows and number of columns of dataframe in pyspark c1

size and shape of the dataframe is nothing but the number of rows and number of columns of the dataframe in pyspark.

########## Get Size and shape of the dataframe in pyspark

print((df_student.count(), len(df_student.columns)))

Result:

(7 , 5)

 

Count the number of rows in pyspark – Get number of rows

Syntax:

 df.count()

df – dataframe

dataframe.count() function counts the number of rows of dataframe.

########## count number of rows

df_student.count()

Result:

7

 

 

 

Count the number of distinct rows in pyspark – Get number of distinct rows:

Syntax:

 df.distinct.count()

df – dataframe

dataframe.distinct.count() function counts the number of distinct rows of dataframe.

########## count number of distinct rows

df_student.distinct().count()

Result:

6

 


Count the number of columns in pyspark – Get number of columns:

Syntax:

 Len(df.columns)

df – dataframe

len(df.columns)  counts the number of columns of dataframe.

########## count number of columns

len(df_student.columns)

Result:

5

 


Other Related Topics:

 

Get number of rows and number of columns of dataframe in pyspark                                                                                               Get number of rows and number of columns of dataframe in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.