Sum of two or more columns in pyspark

In order to calculate sum of two or more columns in pyspark. we will be using + operator of the column to calculate sum of columns. Second method is to calculate sum of columns in pyspark and add it to the dataframe by using simple + operation along with select Function. Let’s see an example of each.

  • Sum of two or more columns in pyspark using + and select()
  • Sum of multiple columns in pyspark and appending to dataframe

We will be using the dataframe df_student_detail.

Sum of two or more columns in pyspark 1

 

Sum of two or more columns in pyspark : Method 1

  • In Method 1 we will be using simple + operator to calculate sum of multiple columns. we will also be using select() function along with the + operator
### Sum of two or more columns in pyspark

from pyspark.sql.functions import col

df1=df_student_detail.select(((col("mathematics_score") + col("science_score"))).alias("sum"))
df1.show()

This method simply adds up and produce the resultant column as shown below.

Sum of two or more columns in pyspark 2

Sum of multiple columns in pyspark and appending to dataframe: Method 2

In Method 2 we will be using simple + operator to calculate sum of two or more columns, and appending the results to the dataframe by naming the column as sum

### Sum of two or more columns in pyspark

from pyspark.sql.functions import col

df1=df_student_detail.withColumn("sum", col("mathematics_score")+col("science_score"))
df1.show()

so we will be adding the two columns namely “mathematics_score” and “science_score”,  then storing the result in the column named “sum” as shown below in the resultant dataframe.

Sum of two or more columns in pyspark 3

 


Other Related Topics:

 

Sum of two or more columns in pyspark                                                                                        Sum of two or more columns in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.