Aggregate() Function in R

Aggregate() Function in R Splits the data into subsets, computes summary statistics for each subsets and returns the result in a group by form. Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in performing all the aggregate operations like sum,count,mean, minimum and Maximum.

Lets see an Example of following

  • Aggregate() method which computes group sum
  • calculate the group max and minimum using aggregate() Method
  • Aggregate() method which computes group mean
  • Get group counts using aggregate() function.

A pictographical representation of aggregate() function i.e. aggregate sum is shown below

Aggregate function in R 0

Syntax for Aggregate() Function in R:

aggregate(x, by, FUN, …, simplify = TRUE, drop = TRUE)
X an R object, Mostly a dataframe
by a list of grouping elements, by which the subsets are grouped by
FUN a function to compute the summary statistics
simplify a logical indicating whether results should be simplified to a vector or matrix if possible
drop a logical indicating whether to drop unused combinations of grouping values.

Example of  Aggregate() Function in R:

Let’s use the iris data set to demonstrate a simple example of aggregate function in R. We all know about iris dataset. Suppose if want to find the mean of all the metrics (Sepal.Length Sepal.Width Petal.Length Petal.Width) for the distinct species then we can use aggregate function

# Aggregate function in R with mean summary statistics
agg_mean = aggregate(iris[,1:4],by=list(iris$Species),FUN=mean, na.rm=TRUE)
agg_mean

the above code takes first 4 columns of iris data set and groups by “species” by computing the mean for each group, so the output will be

Aggregate function in R 1

note: When using the aggregate() function, the by variables must be in a list.

Example for aggregate() function in R with sum: 

Let’s use the aggregate() function in R to create the sum of all the metrics across species and group by species.

# Aggregate function in R with sum summary statistics
agg_sum = aggregate(iris[,1:4],by=list(iris$Species),FUN=sum, na.rm=TRUE)
agg_sum

When we execute the above code, the output will be

Aggregate function in R 2

 

Example for aggregate() function in R with count: 

Let’s use the aggregate() function to create the count of all the metrics across species and group by species.

# Aggregate function in R with count
agg_count = aggregate(iris[,1:4],by=list(iris$Species),FUN=length)
agg_count

the above code takes first 4 columns of iris data set and groups by “species” by computing the count for each group, so the output will be

Aggregate function in R 3

 

Example for aggregate() function in R with maximum: 

Let’s use the aggregate() function to create the maximum of all the metrics across species and group by species.

# Aggregate function in R with maximum
agg_max = aggregate(iris[,1:4],by=list(iris$Species),FUN=max, na.rm=TRUE)
agg_max

the above code takes first 4 columns of iris data set and groups by “species” by computing the max for each group, so the output will be

Aggregate function in R 4

 

Example for aggregate() function in R with minimum: 

Let’s use the aggregate() function to create the minimum of all the metrics across species and group by species.

# Aggregate function in R with minimum
agg_min = aggregate(iris[,1:4],by=list(iris$Species),FUN=min, na.rm=TRUE)
agg_min

the above code takes first 4 columns of iris data set and groups by “species” by computing the min for each group, so the output will be

Aggregate function in R 5

 

previous small aggregate() function in r                                                                                                           next small aggregate() function in r

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.