In order to calculate sum of two or more columns in pyspark. we will be using + operator of the column to calculate sum of columns. Second method is to calculate sum of columns in pyspark and add it to the dataframe by using simple + operation along with select Function. Let’s see an example of each.
- Sum of two or more columns in pyspark using + and select()
- Sum of multiple columns in pyspark and appending to dataframe
We will be using the dataframe df_student_detail.
Sum of two or more columns in pyspark : Method 1
- In Method 1 we will be using simple + operator to calculate sum of multiple columns. we will also be using select() function along with the + operator
### Sum of two or more columns in pyspark from pyspark.sql.functions import col df1=df_student_detail.select(((col("mathematics_score") + col("science_score"))).alias("sum")) df1.show()
This method simply adds up and produce the resultant column as shown below.
Sum of multiple columns in pyspark and appending to dataframe: Method 2
In Method 2 we will be using simple + operator to calculate sum of two or more columns, and appending the results to the dataframe by naming the column as sum
### Sum of two or more columns in pyspark from pyspark.sql.functions import col df1=df_student_detail.withColumn("sum", col("mathematics_score")+col("science_score")) df1.show()
so we will be adding the two columns namely “mathematics_score” and “science_score”, then storing the result in the column named “sum” as shown below in the resultant dataframe.
Other Related Topics:
- Mean of two or more columns in pyspark
- Row wise mean, sum, minimum and maximum in pyspark
- Rename column name in pyspark – Rename single and multiple column
- Typecast Integer to Decimal and Integer to float in Pyspark
- Get number of rows and number of columns of dataframe in pyspark
- Extract Top N rows in pyspark – First N rows
- Absolute value of column in Pyspark – abs() function
- Set Difference in Pyspark – Difference of two dataframe
- Union and union all of two dataframe in pyspark (row bind)
- Intersect of two dataframe in pyspark (two or more)
- Round up, Round down and Round off in pyspark – (Ceil & floor pyspark)
- Sort the dataframe in pyspark – Sort on single column & Multiple column
- Drop rows in pyspark – drop rows with condition
- Distinct value of a column in pyspark
- Distinct value of dataframe in pyspark – drop duplicates
- Count of Missing (NaN,Na) and null values in Pyspark