Frequency table or cross table in pyspark – 2 way cross table

In order to calculate Frequency table or cross table in pyspark we will be using crosstab() function. Frequency table in pyspark can be calculated in roundabout way using group by count. Cross table in pyspark can be calculated using crosstab() function. Let’s get clarity with an example.

  • Calculate Frequency table in pyspark with example
  • Compute Cross table in pyspark with example

We will be using df_basket1

Frequency table or cross table in pyspark 1

 

 

Frequency table in pyspark:

Frequency table in pyspark can be calculated in roundabout way using group by count.

## Frequency table in pyspark
df_basket1.groupBy("Item_group").count().show()

Column name is passed to groupBy function along with count() function as shown, which gives the frequency table

Frequency table or cross table in pyspark 2

 

 

 

Cross table in pyspark   : Method 1

Cross table in pyspark can be calculated using crosstab() function. Cross tab takes two arguments to calculate two way frequency table or cross table.

## Cross table in pyspark

df_basket1.crosstab('Item_group', 'price').show()

Cross table of “Item_group” and “price” is shown below

Frequency table or cross table in pyspark 3

 

Cross table in pyspark   : Method 2

Cross table in pyspark can be calculated using groupBy() function. groupBy() function takes two columns arguments to calculate two way frequency table or cross table.

## Cross table in pyspark
df_basket1.groupBy("Item_group","price").count().show()

Cross table of “Item_group” and “price” columns is shown below

Frequency table or cross table in pyspark 4

 

Frequency table or cross table in pyspark – 2 way cross table                                                                                            Frequency table or cross table in pyspark – 2 way cross table