Delete or Drop rows in R with conditions

Drop rows in R with conditions can be done with the help of subset () function. Let’s see how to delete or drop rows with multiple conditions in R with an example.  Drop rows with missing and null values is accomplished using omit(), complete.cases() and slice() function. Drop rows by row index (row number) and row name in R

  • remove or drop rows with condition in R using subset function
  • remove or drop rows with null values or missing values using omit(), complete.cases() in R
  • drop rows with slice() function in R dplyr package
  • drop duplicate rows in R using dplyr using unique() and distinct() function
  • delete or drop rows based on row number i.e. row index in R
  • delete or drop rows based on row name in R

Drop rows in R with conditions in R 35

Let’s first create the dataframe.

# create dataframe
df1 = data.frame(Name = c('George','Andrea', 'Micheal','Maggie','Ravi','Xien','Jalpa'), 
                 Grade_score=c(4,6,2,9,5,7,8),
                 Mathematics1_score=c(45,78,44,89,66,49,72),
                 Science_score=c(56,52,45,88,33,90,47))
df1

So the resultant dataframe will be

Delete or Drop rows in R with conditions R 1

 

Delete or Drop rows in R with conditions:

Method 1:

Delete rows with name as George or Andrea

df2<-df1[!(df1$Name=="George" | df1$Name=="Andrea"),]
df2

Resultant dataframe will be

Delete or Drop rows in R with conditions R 2

 

Method 2: drop rows using subset() function

Drop rows with conditions in R using subset function.

df2<-subset(df1, Name!="George" & Name!="Andrea")
df2

Resultant dataframe will be

Delete or Drop rows in R with conditions R 3

 

Method 3: using slice() function in dplyr package of R

Drop rows with conditions in R using slice() function.

### Drop rows using slice() function in R

library(dplyr)

df2 <- df1 %>% slice(-c(2, 4, 6))
df2

Resultant dataframe with 2nd, 4th and 6th rows removed as shown below

drop rows with multiple conditions in R 1

 


Drop Rows by row name and Row number (Row index) in R:

Drop rows in R with conditions in R 33

Drop Row by row number or row index:

Dropping or removing Rows by row number or Row index in R can be accomplished either by slice() function and also by the ‘-‘ operator.

### Drop rows using slice() function in R

library(dplyr)

df2 <- df1 %>% slice(-c(2, 4, 6))
df2

OR

### Drop rows using "-" operator in R

df2 <- df1[-c(2, 4, 6), ]
df2

Resultant dataframe with 2nd, 4th and 6th rows removed as shown below

drop rows with multiple conditions in R 1

 

Drop Row by row name :

Drop Rows by row name or Row index in R can be accomplished either by slice() function and also by the ‘-‘ operator.

### Drop rows using slice() function in R

library(dplyr)

df1[!(row.names(df1) %in% c('1','2')), ]

Row names are nothing but row index numbers in this case

Drop rows in R with conditions in R 31

 


Drop rows with missing values in R (Drop NA, Drop NaN) :

Drop rows in R with conditions in R 34

Let’s first create the dataframe with NA values as shown below

df1 = data.frame(Name = c('George','Andrea', 'Micheal','Maggie','Ravi','Xien','Jalpa',''), 
                 Mathematics_score=c(45,78,44,89,66,NaN,72,87),
                 Science_score=c(56,52,NA,88,33,90,47,76))
df1

dataframe will be

Drop rows with missing values in R 1

Method 1: Remove or Drop rows with NA using omit() function:

Using na.omit() to remove rows with (missing) NA and NaN values

df1_complete = na.omit(df1) # Method 1 - Remove NA
df1_complete

so after removing NA and NaN the resultant dataframe will be

Drop rows with missing values in R 2

 

Method 2: Remove or Drop rows with NA using complete.cases() function

Using complete.cases() to remove rows with (missing) NA and NaN values

df1[complete.cases(df1),]

so after removing NA and NaN the resultant dataframe will be

Drop rows with missing values in R 3

 

Removing Both Null and missing:

By subsetting each column with non NAs and not null is round about way to remove both Null and missing values as shown below

# Remove null  &amp; NA values
df1[!(is.na(df1$Name) | df1$Name=="" | is.na(df1$Science_score) | df1$Science_score==""|is.na(df1$Mathematics_score) | df1$Mathematics_score==""),] 

so after removing Null, NA and NaN the resultant dataframe will be

Drop rows with missing values in R 4

 


Drop Duplicate row in R :

Drop rows in R with conditions in R 32

We will be using the following dataframe  to depict the drop duplicates in R. Lets first create the dataframe.

# simple Data frame creation

mydata = data.frame (NAME =c ('Alisa','Bobby','jodha','jack','raghu','Cathrine',
                      'Alisa','Bobby','kumar','Alisa','jack','Cathrine'),
                      Age = c (26,24,26,22,23,24,26,24,22,26,22,25),
                      Score =c(85,63,55,74,31,77,85,63,42,85,74,78))

mydata

so the resultant data frame will be

remove duplicates in R dplyr 1

 

distinct() Function in Dplyr  –  Remove duplicate rows of a dataframe in R:

library(dplyr)

# Remove duplicate rows of the dataframe
distinct(mydata)

In this dataset, all the duplicate rows are removed so it returns the unique rows in mydata.

remove duplicates in R dplyr 2

 

DROP Duplicates in R using unique() function in R:

When we apply unique function to the above data frame

## Apply unique function for data frame in R
unique(mydata)

Duplicate entries in the data frame are eliminated and the final output will be
unique function in R 5

 

 

Remove Duplicates based on a column using duplicated() function:

duplicated()  function along with [!] takes up the column name as argument and results in identifying unique value of the particular column as shown below

 
## unique value of the column in R dataframe 
mydata[!duplicated(mydata$NAME), ] 

so the dataframe with unique values of the NAME column will be

remove duplicates in R dplyr 3

 


Other Related Topics:

 

                                                                                                         

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.