Percentile Rank of the column in pyspark

In order to calculate the percentile rank of the column in pyspark we use percent_rank() Function.  percent_rank() function along with partitionBy() of other column calculates the percentile Rank of the column by group. Let’s see an example on how to calculate percentile rank of the column in pyspark.

  • Percentile Rank of the column in pyspark
  • Percentile Rank of the column by group in pyspark

We will be using the dataframe df_basket1

Percentile Rank of the column in pyspark 1

 

 

Percentile Rank of the column in pyspark:

Percentile rank of the column is calculated by percent_rank() function. We will be using partitionBy(), orderBy() on a column so that Percentile rank will be populated.

### Percentile Rank of the column in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.percent_rank().over(Window.partitionBy().orderBy(df_basket1['price'])).alias("percent_rank"))
df_basket1.show()

so the resultant percentile rank calculated dataframe will be

Percentile Rank of the column in pyspark 2

 

 

Percentile Rank of the column by group in pyspark:

Percentile rank of the column by group is calculated by percent_rank() function. We will be using partitionBy() on “Item_group” column, orderBy() on “Price” column so that Percentile rank will be populated.

### Percentile Rank of the column by group in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.percent_rank().over(Window.partitionBy(df_basket1['Item_group']).orderBy(df_basket1['price'])).alias("percent_rank"))
df_basket1.show()

So the resultant dataframe with percentile rank populated by group will be

Percentile Rank of the column in pyspark 3

 

Percentile Rank of the column in pyspark                                                                                              Percentile Rank of the column in pyspark