In order to calculate the row wise mean in pyspark, we will be using different functions. Row wise mean in pyspark is calculated in roundabout way using simple + operator to calculate sum and dividing by number of columns gives the mean and also by using expr() – SQL expression and selectexpr()
We will use the dataframe named df.
Calculate Row Wise Mean in Pyspark : Method 1
We will be using simple + operator to calculate row mean in pyspark. using + to calculate sum and dividing by number of columns gives the mean
### Row wise mean in pyspark from pyspark.sql.functions import col, lit df1=df.select(((col("mathematics_score") + col("science_score")) / lit(2)).alias("mean")) df1.show()
so the row level mean will be
output :
Row wise mean in pyspark and appending it to dataframe: Method 2
In Method 2 we will be using simple + operator and dividing the result by number of columns to calculate row level mean in pyspark, and appending the results to the dataframe
### Row wise mean in pyspark from pyspark.sql.functions import col df1=df.withColumn("mean", (col("mathematics_score")+col("science_score"))/2) df1.show()
So the resultant dataframe will be
Output:
Row mean in pyspark and appending it to dataframe: Method 3
One can calculate the row-wise mean using the pyspark.sql.functions.expr function
### Row wise mean in pyspark from pyspark.sql.functions import expr df = df.withColumn("row_mean", expr("(science_score + mathematics_score) / 2")) df.show()
Explanation:
withColumn(): Creates a new column in the DataFrame.
expr(): Constructs an SQL expression to compute the mean by summing up the values of science_score, and mathematics_score and dividing by the number of columns (2 in this case).
So the resultant dataframe will be
Output:
Row level mean in pyspark and appending it to dataframe: Method 4
One can calculate the row-wise mean using the selectExpr() function for multiple rows
### Row wise mean in pyspark from pyspark.sql.functions import expr df.selectExpr("student_id", "name", "science_score", "mathematics_score", "(science_score + mathematics_score) / 2 as row_mean").show()
Explanation:
selectExpr(): Another method that allows you to calculate expressions on the columns and directly assign the result to row_mean.
So the resultant dataframe will be
Output: