Mean, Variance and standard deviation of column in pyspark can be accomplished using aggregate() function with argument column name followed by mean , variance and standard deviation according to our need. Mean, Variance and standard deviation of the group in pyspark can be calculated by using groupby along with aggregate() Function. We will see with an example for each

- Mean of the column in pyspark with example
- Variance of the column in pyspark with example
- Standard deviation of column in pyspark with example
- Mean of each group of dataframe in pyspark with example
- Variance of each group of dataframe in pyspark with example
- Standard deviation of each group of dataframe in pyspark with example

We will be using dataframe named **df_basket1 **

**Mean of the column in pyspark with example:**

Mean of the column in pyspark is calculated using aggregate function – agg() function. The agg() Function takes up the column name and ‘mean’ keyword which returns the mean value of that column

## Mean value of the column in pyspark df_basket1.agg({'Price': 'mean'}).show()

Mean value of price column is calculated

**Variance of the column in pyspark with example:**

Variance of the column in pyspark is calculated using aggregate function – agg() function. The agg() Function takes up the column name and ‘variance’ keyword which returns the variance of that column

## Variance of the column in pyspark df_basket1.agg({'Price': 'variance'}).show()

Variance of price column is calculated

**Standard Deviation of the column in pyspark with example:**

Standard Deviation of the column in pyspark is calculated using aggregate function – agg() function. The agg() Function takes up the column name and ‘stddev’ keyword which returns the standard deviation of that column

## Variance of the column in pyspark df_basket1.agg({'Price': 'stddev'}).show()

Standard deviation of price column is calculated

**Mean of each group in pyspark with example:**

Mean value of each group in pyspark is calculated using aggregate function – agg() function along with groupby(). The agg() Function takes up the column name and ‘mean’ keyword, groupby() takes up column name which returns the mean value of each group in a column

# Mean value of each group df_basket1.groupby('Item_group').agg({'Price': 'mean'}).show()

Mean price of each “Item_group” is calculated

** **

**Variance of each group in pyspark with example:**

Variance of each group in pyspark is calculated using aggregate function – agg() function along with groupby(). The agg() Function takes up the column name and ‘variance’ keyword, groupby() takes up column name, which returns the variance of each group in a column

# Variance of each group df_basket1.groupby('Item_group').agg({'Price': 'variance'}).show()

Variance price of each “Item_group” is calculated

**Standard deviation of each group in pyspark with example:**

Standard deviation of each group in pyspark is calculated using aggregate function – agg() function along with groupby(). The agg() Function takes up the column name and ‘stddev’ keyword, groupby() takes up column name, which returns the standard deviation of each group in a column

# Standard deviation of each group df_basket1.groupby('Item_group').agg({'Price': 'stddev'}).show()

Standard deviation price of each “Item_group” is calculated