Quantile rank, decile rank & n tile rank in pyspark – Rank by Group

In order to calculate the quantile rank , decile rank and n tile rank in pyspark we use ntile() Function.  By passing argument 4 to ntile() function quantile rank of the column in pyspark is calculated. By passing argument 10 to ntile() function decile rank of the column in pyspark is calculated. Let’s see with an example of each.

  • Quantile Rank of the column in pyspark
  • Quantile rank of the column by group in pyspark
  • Decile Rank of the column in pyspark using ntile() function
  • Decile rank of the column by group in pyspark
  • N tile rank of the column in pyspark

We will be using the dataframe df_basket1
Quantile rank, decile rank & n tile rank in pyspark - Rank by Group 1

 

 

Quantile Rank of the column in pyspark

Quantile rank of the “price” column is calculated by passing argument 4 to ntile() function. we will be using partitionBy(), orderBy() on “price” column.


### Quantile Rank in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.ntile(4).over(Window.partitionBy().orderBy(df_basket1['price'])).alias("quantile_rank"))
df_basket1.show()

so the resultant quantile rank is shown below

Quantile rank, decile rank & n tile rank in pyspark - Rank by Group 2

 

 

Quantile Rank of the column by group in pyspark

Quantile rank, decile rank & n tile rank in pyspark - Rank by Group c1

Quantile rank of the column by group is calculated by passing argument 4 to ntile() function. we will be using partitionBy() on “Item_group”, orderBy() on “price” column.

### Quantile Rank of the column by group in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.ntile(4).over(Window.partitionBy(df_basket1['Item_group']).orderBy(df_basket1['price'])).alias("quantile_rank"))
df_basket1.show()

so the resultant quantile rank by group is shown below
Quantile rank, decile rank & n tile rank in pyspark - Rank by Group 3

 

 


Decile Rank of the column in pyspark

Decile rank of the “price” column is calculated by passing argument 10 to ntile() function. we will be using partitionBy(), orderBy() on “price” column.

### Decile Rank of the column in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.ntile(10).over(Window.partitionBy().orderBy(df_basket1['price'])).alias("decile_rank"))
df_basket1.show()

So the resultant Decile rank is shown below
Quantile rank, decile rank & n tile rank in pyspark - Rank by Group 4

 

 

 

Decile Rank of the column by group in pyspark

Decile rank of the column by group is calculated by passing argument 10 to ntile() function. we will be using partitionBy() on “Item_group”, orderBy() on “price” column.

### Decile Rank of the column by group in pyspark

from pyspark.sql.window import Window
import pyspark.sql.functions as F

df_basket1 = df_basket1.select("Item_group","Item_name","Price", F.ntile(10).over(Window.partitionBy(df_basket1['Item_group']).orderBy(df_basket1['price'])).alias("decile_rank"))
df_basket1.show()

so the resultant Decile rank by group is shown below
Quantile rank, decile rank & n tile rank in pyspark - Rank by Group 6

NOTE: N tile rank of the column in pyspark – N tile function takes up the argument to calculate n tile rank of the column in pyspark.

 


Other Related Topics:

 

                                                                                         

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.