Remove Duplicate rows in R using Dplyr – distinct () function

Distinct function in R is used to remove duplicate rows in R using Dplyr package.  Dplyr package in R is provided with distinct() function  which eliminate duplicates rows with single variable or with multiple variable.

We will be using mtcars data to depict the above functions

distinct() Function in Dplyr  –  Remove duplicate rows of a dataframe:

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe
distinct(mydata)

In this dataset, there is not a single duplicate row so it returned same number of rows as in mydata.

Remove Duplicate rows in R using Dplyr distinct() Function 1

 

Remove Duplicate Rows based on a variable

We will be removing duplicate rows using a particular variable.

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe using carb variable
distinct(mydata,carb, .keep_all= TRUE)

The .keep_all function is used to retain all other variables in the output data frame. So the output dataframe will be

Remove Duplicate rows in R using Dplyr – distinct () function - image Remove-Duplicate-rows-in-R-using-Dplyr-distinct-function-1 on http://www.datasciencemadesimple.com

 

Remove Duplicate Rows based on multiple variables

We will be removing duplicate rows using Multiple variables in the below example.

library(dplyr)
mydata <- mtcars

# Remove duplicate rows of the dataframe using cyl and vs variables
distinct(mydata, cyl,vs, .keep_all= TRUE)

The .keep_all function is used to retain all other variables in the output data frame. So the resultant dataframe will be

Remove Duplicate rows in R using Dplyr – distinct () function - image Remove-Duplicate-rows-in-R-using-Dplyr-distinct-Function-2 on http://www.datasciencemadesimple.com

 

Remove Duplicate rows in R using Dplyr – distinct () previous                                                                                                                Remove Duplicate rows in R using Dplyr – distinct () next