NameError: name ‘mean’ is not defined PySpark

The error NameError: name ‘mean’ is not defined typically occurs in PySpark when you’re trying to use a function or variable that hasn’t been imported or defined in the current context. In PySpark, the mean function isn’t available by default.  We need to import the necessary module.

Error Screenshot : NameError: name mean is not defined PySpark

NameError - name mean is not defined pyspark 1

 

 

Fix for the Error in PySpark : NameError: name ‘mean’ is not defined

Using mean() from pyspark.sql.functions: import mean

To calculate the mean in PySpark, you can either the mean() function from PySpark’s pyspark.sql.functions for that you need to import the “mean” from “pyspark.sql.functions” as shown below

 
from pyspark.sql.functions import mean

#calculate mean for science_score, and mathematics_score columns
df.select(mean(df.science_score), mean(df.mathematics_score)).show()

now the import error (name error) is gone and here is the output

NameError - name mean is not defined pyspark 2

 

 

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts