Groupby maximum in R

Groupby maximum in R can be accomplished by aggregate() or group_by() function of dplyr package. Groupby maximum of multiple column and single column in R is accomplished by multiple ways some among them are group_by() function of dplyr package in R and aggregate() function in R. Let’s see how to

Groupby max of single column in R
Groupby max of multiple columns in R
Groupby maximum using aggregate() function
Groupby maximum using group_by() function.

Groupby maximum in R and its functionality has been pictographically represented as shown below

Generic Groupby max 1

First let’s create a dataframe

df1= data.frame(Name=c('James','Paul','Richards','Marico','Samantha','Ravi','Raghu','Richards','George','Ema','Samantha','Catherine'),
    State=c('Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'),
    Sales=c(14,24,31,12,13,7,9,31,18,16,18,14))
df1

df1 will be

groupby max in R 1

Groupby using aggregate() syntax:

aggregate(x, by, FUN, …, simplify = TRUE, drop = TRUE)

X	an R object, Mostly a dataframe
by	a list of grouping elements, by which the subsets are grouped by
FUN	a function to compute the summary statistics
simplify	a logical indicating whether results should be simplified to a vector or matrix if possible
drop	a logical indicating whether to drop unused combinations of grouping values.

Groupby maximum of single column in R:

Method 1:

Aggregate function along with parameter by – by which it is to be grouped and function max is mentioned as shown below

# Groupby max of single column

aggregate(df1$Sales, by=list(df1$State), FUN=max)

so the grouped dataframe will be

groupby max in R 2a

Method 2: group_by() using dplyr

group_by() function takes “state” column as argument summarise() uses max() function to find maximum of sales.

library(dplyr)
df1 %>% group_by(State) %>% summarise(Max_sales = max(Sales))

so the grouped dataframe with maximum of sales calculated will be

groupby max in R 2b

Groupby maximum of multiple column in R:

Method 1:

Aggregate function which is grouped by “State” and “Name”, along with function max is mentioned as shown below

# Groupby max of multiple columns

aggregate(df1$Sales, by=list(df1$State,df1$Name), FUN=max)

so the grouped dataframe will be

groupby max in R 3a

Method 2: groupby using dplyr

group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses max() function to find maximum of a sales.

library(dplyr)
df1 %>% group_by(State,Name) %>% summarise(Max_sales = max(Sales))

so the grouped dataframe by “State” and “Name” column with aggregated max of sales will be

groupby max in R 3b

For further understanding of group_by() function in R using dplyr one can refer the dplyr documentation

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts