Rearrange or reorder column in pyspark

In order to Rearrange or reorder the column in pyspark we will be using select function. To reorder the column in ascending order we will be using Sorted function. To reorder the column in descending order we will be using Sorted function with an argument reverse =True. We also rearrange the column by position. lets get clarity with an example.

  • Rearrange or Reorder the column in pyspark
  • Reorder the column names in pyspark in ascending order
  • Reorder the column names in pyspark in descending order
  • Reorder the column by position in pyspark

We will use the dataframe named df_basket1.

Re arrange or re order column in pyspark 1

 

 

Rearrange the column in pyspark :

rearrange or reorder column in pyspark c1

Using select() function in pyspark we can select the column in the order which we want which in turn rearranges the column according to the order that we want which is shown below

df_basket_reordered = df_basket1.select("price","Item_group","Item_name")
df_basket_reordered.show()

so the resultant dataframe with rearranged columns will be

Re arrange or re order column in pyspark 2

 

 

Reorder the column in pyspark in ascending order

rearrange or reorder column in pyspark c2

With the help of select function along with the sorted function in pyspark we first sort the column names in ascending order. Column name is passed to the sorted () function and then it is selected using select function as shown below.

## Reorder column by ascending order
df_basket_reordered = df_basket1.select(sorted(df_basket1.columns))
df_basket_reordered.show()

So the resultant dataframe with columns sorted in ascending order will be

Re arrange or re order column in pyspark 3

 

 

Reorder the column in pyspark in descending order

Column name is passed to the sorted () function along with the argument reverse=True which sorts the column in descending order and then it is selected using select function as shown below.

## Reorder column by descending order
df_basket_reordered = df_basket1.select(sorted(df_basket1.columns,reverse=True))
df_basket_reordered.show()

Re arrange or re order column in pyspark 4

 

 

Reorder the column by position in pyspark :

rearrange or reorder column in pyspark c3

We can use the select function to reorder the column by position. In the below example the columns are reordered in such away that 2nd ,0th and 1st column takes the position of 0 to 2 respectively

## Reorder column by position

df_basket1.select(df_basket1.columns[2],df_basket1.columns[0],df_basket1.columns[1]).show()

so the resultant dataframe with column reodered by position will be

rearrange or reorder column in pyspark d2

 

 


Other Related Topics:

Re arrange or re order column in pyspark                                                                                             Re arrange or re order column in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts