Get number of rows and number of columns of dataframe in pyspark

In order to get the number of rows and number of column in pyspark we will be using functions like count() function and length() function. Dimension of the dataframe in pyspark is calculated by extracting the number of rows and number columns of the dataframe. We will also get the count of distinct rows in pyspark .Let’s see how to

  • Count the number of rows in pyspark with an example
  • Count the number of distinct rows in pyspark with an example
  • Count the number of columns in pyspark with an example

We will be using dataframe named df_student

Get number of rows and number of columns of dataframe in pyspark 1

 

 

Count the number of rows in pyspark – Get number of rows

Syntax:

 df.count()

df – dataframe

dataframe.count() function counts the number of rows of dataframe.

########## count number of rows

df_student.count()

Result:

7

 

 

 

Count the number of distinct rows in pyspark – Get number of distinct rows

Syntax:

 df.distinct.count()

df – dataframe

dataframe.distinct.count() function counts the number of distinct rows of dataframe.

########## count number of distinct rows

df_student.distinct().count()

Result:

6

 

 

 

Count the number of columns in pyspark – Get number of columns

Syntax:

 Len(df.columns)

df – dataframe

len(df.columns)  counts the number of columns of dataframe.

########## count number of columns

len(df_student.columns)

Result:

5

 

Get number of rows and number of columns of dataframe in pyspark                                                                                               Get number of rows and number of columns of dataframe in pyspark