Descriptive Statistics of the dataframe in R can be calculated by 3 different methods. Let’s see how to calculate summary statistics of each column of dataframe in R with an example for each method. summary() function in R is used to get the summary statistics of the column

- Descriptive statistics with summary function in R
- Summary statistics in R using stat.desc() function from “pastecs” package
- Descriptive statistics with describe() function from “Hmisc” package
- summarise() function of the dplyr package in R

Let’s first create the dataframe.

### Create Data Frame df1 = data.frame(Name = c('George','Andrea', 'Micheal','Maggie','Ravi','Xien','Jalpa'), Grade_score=c(4,6,2,9,5,7,8), Mathematics1_score=c(45,78,44,89,66,49,72), Science_score=c(56,52,45,88,33,90,47)) df1

So the resultant dataframe will be

**Descriptive statistics in R (Method 1):**

summary statistic is computed using summary() function in R. **summary**() function is automatically applied to each column. The format of the result depends on the data type of the column.

- If the column is a numeric variable, mean, median, min, max and quartiles are returned.
- If the column is a factor variable, the number of observations in each group is returned.

Descriptive statistics in R with simple summary function calculates

- minimum value of each column
- maximum value of each column
- mean value of each column
- median value of each column
- 1st quartile of each column (25th percentile)
- 3rd quartile of each column (75th percentile)

as shown below

# Summary statistics of dataframe in R summary(df1)

summary statistics is

#### summary statistics of a single column in R:

Five values of a specified column is returned: the mean, median, 25th and 75th quartiles, min and max in one single line call:

# Summary statistics of a column in R summary(df1$Science_score)

so the summary statistics of the “Science_score” column will be

**Summary / Descriptive statistics in R (Method 2):**

Descriptive statistics in R with pastecs package does bit more than simple describe () function. It also Calculates

- number of missing values and null of each column in R
- number of non missing values of each column
- sum , range ,variance and standard deviation etc for each column

# descripive statistics of dataframe in R install.packages("pastecs") library(pastecs) stat.desc(df1)

summary statistics is

**Summary statistics in R (Method 3):**

Descriptive statistics in R with Hmisc package calculates the distinct value of each column, frequency of each value and proportion of that value in that column. as shown below

# Summary statistics of dataframe in R install.packages("Hmisc") library(Hmisc) describe(df1)

summary statistics is

#### Summarise using dplyr() package in R

We will be using mtcars data to depict the example of summarise function.

library(dplyr) mydata = mtcars # summarise the columns of dataframe summarise(mydata, mpg_mean=mean(mpg),mpg_median=median(mpg))

summarise() function that gets the mean and median of mpg.

**summarise_all() **

The summarise_all() function allows you to summarise all the variables.

library(dplyr) mydata = mtcars # summarise all the column of dataframe summarise_all(mydata,funs(n(),mean,median))

summarise_all() function that gets the number of rows, mean and median of all the columns.

**Summarize categorical or factor Variable:**

We will be summarizing the number of levels/categories and count of missing observations in a categorical (factor) variable. Let’s use iris dataset for example

library(dplyr) mydata2 = iris summarise_all(mydata2["Species"], funs(nlevels(.), nmiss=sum(is.na(.))))

In the iris dataset “Species” column has three distinct levels and zero missing values as shown below.