Filter or subset rows in R using Dplyr

In order to Filter or subset rows in R we will be using Dplyr package. Dplyr package in R is provided with filter() function which subsets the rows with multiple conditions on different criteria.

We will be using mtcars data to depict the example of filtering or subsetting.

  • Filter or subset the rows in R using dplyr.
  • Subset or Filter rows in R with multiple condition
  • Filter rows based on AND condition OR condition in R
  • Filter rows using slice family of functions for a matrix or data frame in R
  • slice_sample() function in R returns the sample n rows of the dataframe in R
  • slice_head() and slice_tail() function in R returns first n and last n rows in R
  • subset and group by rows in R
  • subset using top_n() function in R
  • select or subset a sample using sample_n() and sample_frac() function in R

subset or filter in R 17

 

Filter or subset the rows in R using Dplyr:

Subset using filter() function.

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with condition
Mydata1 = filter(mydata,cyl==6)
Mydata1

Only the rows with cyl =6 is filtered

Filter or subsetting rows in R using Dplyr 1

 

Filter or subset the rows in R with multiple conditions using Dplyr:

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with multiple conditions
Mydata1 = filter(mydata, gear %in% c(4,5))
Mydata1

The rows with gear=4 or 5 are filtered

Filter or subsetting rows in R using Dplyr 2

 

Filter or subsetting the rows in R with multiple conditions (AND) using Dplyr:

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with multiple conditions
Mydata1 = filter(mydata, gear %in% c(4,5) & carb==2)
Mydata1

The rows with gear= (4 or 5) and carb=2 are filtered

Filter or subsetting rows in R using Dplyr 3

 

Filter or subsetting the rows in R with multiple conditions (OR) using Dplyr:

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with multiple conditions
Mydata1 = filter(mydata, gear %in% c(4,5) | mpg==21.0)
Mydata1

The rows with gear= (4 or 5)  or mpg=21 are filtered

Filter or subsetting rows in R using Dplyr 4

 

Filter or subsetting the rows in R with multiple conditions (NOT) using Dplyr:

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with multiple conditions
Mydata1 = filter(mydata, !gear %in% c(4,5))
Mydata1

The rows with gear!=4 or gear!=5 are filtered

Filter or subsetting rows in R using Dplyr 5

 

Filter or subsetting the rows in R with Contains condition using Dplyr:

library(dplyr)
mydata <- mtcars

# subset the rows of dataframe with multiple conditions
Mydata1 = filter(mydata, grepl(0,hp))
Mydata1

hp  which contains value 0 are filtered

Filter or subsetting rows in R using Dplyr 6

 

 


Subset using Slice Family of function in R dplyr :

slice_head() function in R:

slice_head() function returns the top n rows of the dataframe as shown below.

 
# slice_head() function in R
library(dplyr)
mtcars %>% slice_head(n = 5)

so the top 5 rows are returned

head() and tail() function in r slice(),top_n() 5

 

slice_tail() function in R:

slice_tail() function returns the bottom n rows of the dataframe as shown below.

 
# slice_tail() function in R
library(dplyr) 
mtcars %>% slice_tail(n = 5)

so the sample 5 rows are returned

head() and tail() function in r slice(),top_n() 6

 

slice_max() function in R:

slice_max() function returns the maximum n rows of the dataframe based on a column as shown below.

 
# slice_max() function in R
library(dplyr) 
mtcars %>% slice_max(mpg, n = 5)

so the max 5 rows based on mpg column will be returned
head() and tail() function in r slice(),top_n() 8

 

slice_min() function in R:

slice_min() function returns the minimum n rows of the dataframe based on a column as shown below.

 
# slice_min() function in R

library(dplyr) 
mtcars %>% slice_min(mpg, n = 5)

so the min 5 rows based on mpg column will be returned
head() and tail() function in r slice(),top_n() 7

 

 

slice_sample() function in R:

slice_sample() function returns the sample n rows of the dataframe as shown below.

 
# slice_sample() function in R

library(dplyr) 
mtcars %>% slice_sample(n = 5)

so the sample 5 rows are returned

head() and tail() function in r slice(),top_n() 9

 


Slice by Group in R:

head and tail function in R 5

 

 

slice_head() by group in R:  returns the top n rows of the group using slice_head() and group_by() functions

 

# slice_head() by group in R 
mtcars %>% group_by(vs) %>% slice_head(n = 2)

head() and tail() function in r slice(),top_n() 10b

 

slice_tail() by group in R:

head and tail function in R 5

slice_tail() by group in R  returns the bottom n rows of the group using slice_tail() and group_by() functions

 

# slice_tail() by group in R 
mtcars %>% group_by(vs) %>% slice_tail(n = 2)

head() and tail() function in r slice(),top_n() 11

 

slice_sample() by group in R

slice_sample() by group in R  Returns the sample n rows of the group using slice_sample() and group_by() functions

 

# slice_sample() by group in R 
mtcars %>% group_by(vs) %>% slice_sample(n = 2)

head() and tail() function in r slice(),top_n() 12

 

 


Using top_n() function in R:

Top n rows of the dataframe with respect to a column is achieved by using top_n() functions

 

# top_n() function in R 

mtcars %>% top_n(10)

so the resultant dataframe will be

head() and tail() function in r slice(),top_n() 13

for more details refer here

 


Subset and select Sample in R :

sample_n() Function in Dplyr  

The sample_n function selects random rows from a data frame (or table). First parameter contains the data frame name, the second parameter of the function tells R the number of rows to select.

library(dplyr)
mydata = mtcars

# select random 4 rows of the dataframe 
sample_n(mydata,4)

In the above code sample_n() function selects random 4 rows of the mtcars dataset. so the result will be

Select Random Samples in R with Dplyr sample_frac() Function

 

sample_frac() Function in Dplyr :

The sample_frac() function selects random n percentage of rows from a data frame (or table). First parameter contains the data frame name, the second parameter tells what percentage of rows to select

library(dplyr)

mydata = mtcars

# select random 20 percentage rows of the dataframe 
sample_frac(mydata,0.2)

In the above code sample_frac() function selects random 20 percentage of rows from mtcars dataset. So the result will be

sample_frac() and sample_n() in R 7

 


Other Related Topics :

Filter or subsetting rows in R using Dplyr                                                                                                          Filter or subsetting rows in R using Dplyr

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.