Count of Missing (NaN,Na) and null values in Pyspark

Count of Missing (NaN,Na) and null values in pyspark can be accomplished using isnan()  function and isNull() function respectively. isnan() function returns the count of missing values of column in pyspark – (nan, na) .  isnull() function returns the count of null values of column in pyspark. We will see with an example for each

  • Count of Missing values of all columns in dataframe in pyspark using isnan() Function
  • Count of null values of dataframe in pyspark using isNull() Function
  • Count of null values of single column in pyspark using isNull() Function
  • Count of Missing values of single column in pyspark using isnan() Function

We will using dataframe df_orders which shown below

Count of Missing and null values in Pyspark 1

 

 

Count of Missing values of dataframe in pyspark using isnan() Function:

Count of Missing values of dataframe in pyspark is obtained using isnan() Function. Each column name is passed to isnan() function which returns the count of missing values of each columns

### Get count of nan or missing values in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(isnan(c), c)).alias(c) for c in df_orders.columns]).show()

So number of missing values of each column in dataframe will be

Count of Missing and null values in Pyspark 2

 

 

Count of null values of dataframe in pyspark using isnull() Function:

Count of null values of dataframe in pyspark is obtained using null() Function. Each column name is passed to null() function which returns the count of null() values of each columns

### Get count of null values in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(col(c).isNull(), c)).alias(c) for c in df_orders.columns]).show()

So number of null values of each column in dataframe will be

Count of Missing and null values in Pyspark 3

 

 

Count of both null and missing values of dataframe in pyspark:

Count of null values of dataframe in pyspark is obtained using null() Function. Count of Missing values of dataframe in pyspark is obtained using isnan() Function.

### Get count of both null and missing values in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df_orders.columns]).show()

So number of both null values and missing values of each column in dataframe will be

Count of Missing and null values in Pyspark 4

 


Count of Missing values of single column in pyspark:

Count of Missing values of single column in pyspark is obtained using isnan() Function. Column name is passed to isnan() function which returns the count of missing values of that particular columns.

### Get count of nan or missing values of single column in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(isnan('order_no'),True))]).show()

Count of missing value of “order_no” column will be

Count of Missing and null values in Pyspark 5

 

 

Count of null values of single column in pyspark:

Count of null values of single column in pyspark is obtained using null() Function. Column name is passed to null() function which returns the count of null() values of that particular columns

### Get count of null values of single column in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(col('order_no').isNull(),True))]).show()

Count of null values of “order_no” column will be

Count of Missing and null values in Pyspark 6

 

 

Count of null and missing values of single column in pyspark:

Count of null values of dataframe in pyspark is obtained using null() Function. Count of Missing values of dataframe in pyspark is obtained using isnan() Function. Passing column name to null() and isnan() function returns the count of null and missing values of that column

### Get count of missing and null values of single column in pyspark

from pyspark.sql.functions import isnan, when, count, col
df_orders.select([count(when(isnan('order_no') | col('order_no').isNull() , True))]).show()

Count of null values and missing values of “order_no” column will be

Count of Missing and null values in Pyspark 7

 


Other Related Topics:

 

Count of Missing (NaN,Na) and null values in Pyspark                                                                                              Count of Missing (NaN,Na) and null values in Pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.