summarise, summarise_at, summarise_if, summarise_all in R- Get the summary of dataset in R using Dplyr

summarise, summarise_at, summarise_if, summarise_all in R – Summary of the dataset (Mean, Median and Mode) in R can be done using Dplyr. Dplyr package in R is provided with summarise() function which gets the summary of dataset in R. Dplyr package has summarise(), summarise_at(), summarise_if(), summarise_all()

We will be using mtcars data to depict the example of summarise function.

 

Summary of column in dataset in R using Dplyr – summarise()

library(dplyr)
mydata <- mtcars

# summarise the columns of dataframe
summarise(mydata, mpg_mean=mean(mpg),mpg_median=median(mpg))

summarise() function that gets the mean and median of mpg.

Get the summary of dataset in R using Dplyr summarise function in R dplyr 1

 

Summary of multiple column of dataset in R using Dplyr – summarise_at()

library(dplyr)
mydata <- mtcars

# summarise the list of columns of dataframe
summarise_at(mydata, vars(mpg, hp), funs(n(), mean, median))

summarise_at() function that gets the number of rows, mean and median of mpg and hp.

Get the summary of dataset in R using Dplyr summarise function in R dplyr 2

 

summarise all numeric variable with summarise_if():

The summarise_if function allows you to summarise conditionally.

library(dplyr)
mydata <- mtcars

# summarise all the list of numeric variable of dataframe
summarise_if(mydata, is.numeric, funs(n(),mean,median))

summarise_if() function that gets the number of rows, mean and median of  all the numeric columns.

Get the summary of dataset in R using Dplyr summarise function in R dplyr 3

 

summarise_all()

The summarise_all function allows you to summarise all the variables.

library(dplyr)
mydata <- mtcars

# summarise all the column of dataframe
summarise_all(mydata,funs(n(),mean,median))

summarise_all() function that gets the number of rows, mean and median of  all the columns.

Get the summary of dataset in R using Dplyr summarise function in R dplyr 4

 

Summarize categorical or factor Variable:

We will be summarizing the number of levels/categories and count of missing observations in a categorical (factor) variable. Let’s use iris dataset for example

library(dplyr)

mydata2 <- iris
summarise_all(mydata2["Species"], funs(nlevels(.), nmiss=sum(is.na(.))))

Get the summary of dataset in R using Dplyr summarise function in R dplyr 5

 

Get the summary of dataset in R using Dplyr – summarise()                                                                                                           Get the summary of dataset in R using Dplyr – summarise()

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts