Sort the dataframe in pyspark – Sort on single column & Multiple column

In order to sort the dataframe in pyspark we will be using orderBy() function. orderBy() Function in pyspark sorts the dataframe in by single column and multiple column. It also sorts the dataframe in pyspark by descending order or ascending order. Let’s see an example of  each.

  • Sort the dataframe in pyspark by single column – ascending order
  • Sort the dataframe in pyspark by single column – descending order
  • Sorting the dataframe in pyspark by multiple columns – ascending order
  • Sorting the dataframe in pyspark by multiple columns – descending order

Syntax:

 df.orderBy(‘colname1’,‘colname2’,ascending=False)

df – dataframe
colname1 – Column name
ascending = False  – sort by descending order
ascending= True – sort by ascending order

We will be using dataframe df_student_detail

Sort the dataframe in pyspark – Sort on single column & Multiple column 1

 

 

Sort the dataframe in pyspark by single column – descending order

sort a column in pyspark c2

 

orderBy() function takes up the column name as argument and sorts the dataframe by column name. It also takes another argument ascending =False which sorts the dataframe by decreasing order of the column

## Sort dataframe in descending - sort by single column

df_student_detail1 = df_student_detail.orderBy('science_score', ascending=False)
df_student_detail1.show()

so the sorted dataframe will be

Sort the dataframe in pyspark – Sort on single column & Multiple column 2

 

 

Sort the dataframe in pyspark by single column – ascending order

sort a column in pyspark c1

orderBy() function takes up the column name as argument and sorts the dataframe by column name. orderBy() function sorts the dataframe by ascending order of the column

## Sort dataframe in ascending - sort by single column

df_student_detail1 = df_student_detail.orderBy('science_score')
df_student_detail1.show()

so the sorted dataframe will be

Sort the dataframe in pyspark – Sort on single column & Multiple column 3

 

 

Sort the dataframe in pyspark by multiple columns – descending order

orderBy() function takes up the two column name as argument and sorts the dataframe by first column name and then by second column both by decreasing order

## Sort dataframe in descending - sort by multiple column

df_student_detail1 = df_student_detail.orderBy('grad_score','science_score', ascending=False)
df_student_detail1.show()

So the sorted dataframe will be
Sort the dataframe in pyspark – Sort on single column & Multiple column 4

 

 

Sort the dataframe in pyspark by multiple columns – ascending order

orderBy() function takes up the two column name as argument and sorts the dataframe by first column name and then by second column both by ascending order

## Sort dataframe in ascending - sort by multiple column

df_student_detail1 = df_student_detail.orderBy('grad_score','science_score')
df_student_detail1.show()

So the sorted dataframe will be
Sort the dataframe in pyspark – Sort on single column & Multiple column 5

 


Other Related Topics :

 

Sort the dataframe in pyspark – Sort on single column & Multiple column                                                                                            Sort the dataframe in pyspark – Sort on single column & Multiple column

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.