Row wise mean, sum, minimum and maximum in pyspark

In order to calculate the row wise mean, sum, minimum and maximum in pyspark, we will be using different functions. Row wise mean in pyspark is calculated in roundabout way. Row wise sum in pyspark is calculated using sum() function. Row wise minimum (min) in pyspark is calculated using least() function. Row wise maximum (max) in pyspark is calculated using greatest() function.

Row wise mean in pyspark
Row wise sum in pyspark
Row wise minimum in pyspark
Row wise maximum in pyspark

We will be using the dataframe df_student_detail.

Row wise mean, sum, minimum and maximum in pyspark 1

Row wise mean in pyspark : Method 1

row wise sum,mean,min,max in pyspark c2

We will be using simple + operator to calculate row wise mean in pyspark. using + to calculate sum and dividing by number of columns gives the mean

### Row wise mean in pyspark

from pyspark.sql.functions import col, lit

df1=df_student_detail.select(((col("mathematics_score") + col("science_score")) / lit(2)).alias("mean"))
df1.show()

Row wise mean, sum, minimum and maximum in pyspark 2

Row wise mean in pyspark and appending it to dataframe: Method 2

In Method 2 we will be using simple + operator and dividing the result by number of columns to calculate row wise mean in pyspark, and appending the results to the dataframe

### Row wise mean in pyspark

from pyspark.sql.functions import col

df1=df_student_detail.withColumn("mean", (col("mathematics_score")+col("science_score"))/2)
df1.show()

So the resultant dataframe will be

Row wise mean, sum, minimum and maximum in pyspark 2b

Row wise sum in pyspark : Method 1

row wise sum,mean,min,max in pyspark c1

We will be using simple + operator to calculate row wise sum in pyspark. We also use select() function to retrieve the result

### Row wise sum in pyspark

from pyspark.sql.functions import col

df1=df_student_detail.select(((col("mathematics_score") + col("science_score"))).alias("sum"))
df1.show()

Row wise mean, sum, minimum and maximum in pyspark 3

Row wise sum in pyspark and appending to dataframe: Method 2

In Method 2 we will be using simple + operator to calculate row wise sum in pyspark, and appending the results to the dataframe by naming the column as sum

### Row wise sum in pyspark

from pyspark.sql.functions import col

df1=df_student_detail.withColumn("sum", col("mathematics_score")+col("science_score"))
df1.show()

So the resultant dataframe will be

Row wise mean, sum, minimum and maximum in pyspark 3c

Row wise minimum in pyspark : Method 1

least() function takes the column name as arguments and calculates the row wise minimum value.

### Row wise minimum in pyspark

from pyspark.sql.functions import col, least

df1=df_student_detail.select((least(col("mathematics_score"),col("science_score"))).alias("minimum"))
df1.show()

Row wise mean, sum, minimum and maximum in pyspark 4

Row wise minimum in pyspark : Method 2

In method 2 two we will be appending the result to the dataframe by using least function. least() function takes the column name as arguments and calculates the row wise minimum value and the result is appended to the dataframe

### Row wise minimum in pyspark

from pyspark.sql.functions import least

df1=df_student_detail.withColumn('minimum', least('mathematics_score', 'science_score'))
df1.show()

So the resultant dataframe with row wise minimum calculated will be

Row wise mean, sum, minimum and maximum in pyspark 5

Row wise maximum in pyspark : Method 1

greatest() function takes the column name as arguments and calculates the row wise maximum value.

### Row wise maximum in pyspark

from pyspark.sql.functions import col, greatest

df1=df_student_detail.select((greatest(col("mathematics_score"),col("science_score"))).alias("maximum"))
df1.show()

Row wise mean, sum, minimum and maximum in pyspark 6

Row wise maximum in pyspark : Method 2

In method 2 two we will be appending the result to the dataframe by using greatest function. greatest() function takes the column name as arguments and calculates the row wise maximum value and the result is appended to the dataframe.

### Row wise maximum in pyspark

from pyspark.sql.functions import greatest

df1=df_student_detail.withColumn('maximum', greatest('mathematics_score', 'science_score'))
df1.show()

So the resultant dataframe with row wise maximum calculated will be

Row wise mean, sum, minimum and maximum in pyspark 7

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts