Row wise Mean in pyspark

In order to calculate the row wise mean in pyspark, we will be using different functions. Row wise mean in pyspark is calculated in roundabout way using   simple + operator to calculate sum and dividing by number of columns gives the mean  and also by using expr() – SQL expression and selectexpr()

row wise mean in pyspark 1

We will use the dataframe named df.

row wise mean in pyspark 2

 

 

Calculate Row Wise Mean in Pyspark : Method 1

We will be using simple + operator to calculate  row mean in pyspark.  using + to calculate sum and dividing by number of columns gives the mean

### Row wise mean in pyspark
from pyspark.sql.functions import col, lit
df1=df.select(((col("mathematics_score") + col("science_score")) / lit(2)).alias("mean"))
df1.show()

so the row level mean will be

output :

row wise mean in pyspark 3

 

 

Row wise mean in pyspark and appending it to dataframe: Method 2

In Method 2 we will be using simple + operator and dividing the result by number of columns to calculate row level mean in pyspark, and appending the results to the dataframe

### Row wise mean in pyspark

from pyspark.sql.functions import col
df1=df.withColumn("mean", (col("mathematics_score")+col("science_score"))/2)
df1.show()

So the resultant dataframe will be

Output:

row wise mean in pyspark 4

 

 

Row  mean in pyspark and appending it to dataframe: Method 3

One can calculate the row-wise mean using the pyspark.sql.functions.expr function


### Row wise mean in pyspark

from pyspark.sql.functions import expr
df = df.withColumn("row_mean", expr("(science_score + mathematics_score) / 2"))
df.show()

Explanation:

withColumn(): Creates a new column in the DataFrame.

expr(): Constructs an SQL expression to compute the mean by summing up the values of science_score,  and mathematics_score and dividing by the number of columns (2 in this case).

So the resultant dataframe will be

Output:

row wise mean in pyspark 4

 

 

Row level mean in pyspark and appending it to dataframe: Method 4

One can calculate the row-wise mean using the selectExpr() function for multiple rows


### Row wise mean in pyspark

from pyspark.sql.functions import expr
df.selectExpr("student_id", "name", "science_score", "mathematics_score", "(science_score + mathematics_score) / 2 as row_mean").show()

Explanation:

selectExpr(): Another method that allows you to calculate expressions on the columns and directly assign the result to row_mean.

So the resultant dataframe will be

Output:

row wise mean in pyspark 4

 

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts