Remove Duplicate rows in R using Dplyr – distinct () function

Distinct function in R is used to remove duplicate rows in R using Dplyr package.  Dplyr package in R is provided with distinct() function  which eliminate duplicates rows with single variable or with multiple variable.

We will be using mtcars data to depict the above functions

distinct() Function in Dplyr  –  Remove duplicate rows of a dataframe:

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe
distinct(mydata)

In this dataset, there is not a single duplicate row so it returned same number of rows as in mydata.

Remove Duplicate rows in R using Dplyr distinct() Function 1

 

Remove Duplicate Rows based on a variable

We will be removing duplicate rows using a particular variable.

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe using carb variable
distinct(mydata,carb, .keep_all= TRUE)

The .keep_all function is used to retain all other variables in the output data frame. So the output dataframe will be

 

Remove Duplicate Rows based on multiple variables

We will be removing duplicate rows using Multiple variables in the below example.

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe using cyl and vs variables
distinct(mydata, cyl,vs, .keep_all= TRUE)

The .keep_all function is used to retain all other variables in the output data frame. So the resultant dataframe will be

 

Remove Duplicate rows in R using Dplyr – distinct () previous                                                                                                         Remove Duplicate rows in R using Dplyr – distinct () next