The error NameError: name ‘mean’ is not defined typically occurs in PySpark when you’re trying to use a function or variable that hasn’t been imported or defined in the current context. In PySpark, the mean function isn’t available by default. We need to import the necessary module.
Error Screenshot : NameError: name mean is not defined PySpark
Fix for the Error in PySpark : NameError: name ‘mean’ is not defined
Using mean() from pyspark.sql.functions: import mean
To calculate the mean in PySpark, you can either the mean() function from PySpark’s pyspark.sql.functions for that you need to import the “mean” from “pyspark.sql.functions” as shown below
from pyspark.sql.functions import mean #calculate mean for science_score, and mathematics_score columns df.select(mean(df.science_score), mean(df.mathematics_score)).show()
now the import error (name error) is gone and here is the output