summarise, summarise_at, summarise_if, summarise_all in R – Summary of the dataset (Mean, Median and Mode) in R can be done using Dplyr. Dplyr package in R is provided with summarise() function which gets the summary of dataset in R. Dplyr package has summarise(), summarise_at(), summarise_if(), summarise_all()
We will be using mtcars data to depict the example of summarise function.
Summary of column in dataset in R using Dplyr – summarise()
library(dplyr) mydata <- mtcars # summarise the columns of dataframe summarise(mydata, mpg_mean=mean(mpg),mpg_median=median(mpg))
summarise() function that gets the mean and median of mpg.
Summary of multiple column of dataset in R using Dplyr – summarise_at()
library(dplyr) mydata <- mtcars # summarise the list of columns of dataframe summarise_at(mydata, vars(mpg, hp), funs(n(), mean, median))
summarise_at() function that gets the number of rows, mean and median of mpg and hp.
summarise all numeric variable with summarise_if():
The summarise_if function allows you to summarise conditionally.
library(dplyr) mydata <- mtcars # summarise all the list of numeric variable of dataframe summarise_if(mydata, is.numeric, funs(n(),mean,median))
summarise_if() function that gets the number of rows, mean and median of all the numeric columns.
summarise_all()
The summarise_all function allows you to summarise all the variables.
library(dplyr) mydata <- mtcars # summarise all the column of dataframe summarise_all(mydata,funs(n(),mean,median))
summarise_all() function that gets the number of rows, mean and median of all the columns.
Summarize categorical or factor Variable:
We will be summarizing the number of levels/categories and count of missing observations in a categorical (factor) variable. Let’s use iris dataset for example
library(dplyr) mydata2 <- iris summarise_all(mydata2["Species"], funs(nlevels(.), nmiss=sum(is.na(.))))