# Summary or Descriptive statistics in R

Descriptive Statistics of the dataframe in R can be calculated by 3 different methods. Let’s see how to calculate summary statistics of each column of dataframe  in R with an example for each method. summary() function in R is used to get the summary statistics of the column

• Descriptive statistics with summary function in R
• Summary statistics in R using stat.desc() function from “pastecs” package
• Descriptive statistics with describe() function from “Hmisc” package
• summarise() function of the dplyr package in R

Let’s first create the dataframe.

```### Create Data Frame
df1 = data.frame(Name = c('George','Andrea', 'Micheal','Maggie','Ravi','Xien','Jalpa'),
Mathematics1_score=c(45,78,44,89,66,49,72),
Science_score=c(56,52,45,88,33,90,47))
df1
```

So the resultant dataframe will be #### Descriptive statistics in R (Method 1):

summary statistic is computed using  summary() function in R. summary() function is automatically applied to each column. The format of the result depends on the data type of the column.

• If the column is a numeric variable, mean, median, min, max and quartiles are returned.
• If the column is a factor variable, the number of observations in each group is returned.

Descriptive statistics in R with simple summary function calculates

• minimum value of each column
• maximum value of each column
• mean value of each column
• median value of each column
• 1st quartile  of each column (25th percentile)
• 3rd quartile of each column (75th percentile)

as shown below

```# Summary statistics of dataframe in R

summary(df1)
```

summary statistics is #### summary statistics of a single column in R:

Five values of a specified column is returned: the mean, median, 25th and 75th quartiles, min and max in one single line call:

```

# Summary statistics of a column in R

summary(df1\$Science_score)
```

so the summary statistics of the “Science_score” column will be #### Summary / Descriptive statistics in R (Method 2):

Descriptive statistics in R with pastecs package does bit more than simple describe () function. It also Calculates

• number of missing values and null of each column in R
• number of non missing values of each column
• sum , range ,variance and standard deviation etc for each column
```# descripive statistics of dataframe in R

install.packages("pastecs")
library(pastecs)
stat.desc(df1)
```

summary statistics is #### Summary statistics in R (Method 3):

Descriptive statistics in R with Hmisc package calculates the  distinct value of each column, frequency of each value and proportion of that value in that column. as shown below

```# Summary statistics of dataframe in R

install.packages("Hmisc")
library(Hmisc)
describe(df1)
```

summary statistics is #### Summarise using dplyr() package in R

We will be using mtcars data to depict the example of summarise function.

```library(dplyr)
mydata = mtcars

# summarise the columns of dataframe
summarise(mydata, mpg_mean=mean(mpg),mpg_median=median(mpg))
```

summarise() function that gets the mean and median of mpg. #### summarise_all()

The summarise_all() function allows you to summarise all the variables.

```library(dplyr)
mydata = mtcars

# summarise all the column of dataframe
summarise_all(mydata,funs(n(),mean,median))
```

summarise_all() function that gets the number of rows, mean and median of  all the columns. #### Summarize categorical or factor Variable:

We will be summarizing the number of levels/categories and count of missing observations in a categorical (factor) variable. Let’s use iris dataset for example

```library(dplyr)

mydata2 = iris
summarise_all(mydata2["Species"], funs(nlevels(.), nmiss=sum(is.na(.))))
```

In the iris dataset “Species” column has three distinct levels and zero missing values as shown below. 