Drop column in R using Dplyr – drop variables

Drop column in R using Dplyr: Drop column in R can be done by using minus before the select function.  Dplyr package in R is provided with select() function which is used to select or drop the columns based on conditions like starts with, ends with, contains and matches certain criteria and also dropping column based on position, Regular expression, criteria like column names with missing values has been depicted with an example for each.

  • Drop column with column name in R dplyr.
  • Drop column by column position in dplyr
  • Drop column which contains a value or matches a pattern.
  • Drop column which starts with or ends with certain character.
  • Drop column name with Regular Expression using grepl() function
  • Drop column name with missing values

We will be using mtcars data to depict, dropping of the variable

Drop by column names in Dplyr R:

select() function along with minus which is used to drop the columns by name

library(dplyr)
mydata <- mtcars

# Drop the columns of the dataframe
select (mydata,-c(mpg,cyl,wt))

the above code drops mpg, cyl and wt columns. thus dropping the column by column name has been accomplished.

Drop variables (columns) in R using Dplyr 1

 

Drop column by position in R Dplyr:

Drop 3rd, 4th and 5th columns of the dataframe:

In order to drop column by column position we will be passing the column position as a vector to the select function with negative sign as shown below.

library(dplyr)
mydata <- mtcars

# Drop 3rd,4th and 5th columns of the dataframe
select(mydata,-c(3,4,5))

the above code drops 3rd, 4th and 5th column. thus dropping the column by column position has been accomplished.

Drop variables (columns) in R using Dplyr 2

 


Dropping by Matching with patterns 

starts_with() function in R:

In order to drop the column which starts with certain label we will be using select() function along with starts_with() function by passing the column label inside the starts_with() function as shown below.

library(dplyr)
mydata <- mtcars

# Drop column names of the dataframe which starts with
select(mydata,-starts_with("mpg"))

Dropping the column name which starts with mpg is accomplished using starts_with() function and select() function.

Drop variables (columns) in R using Dplyr 3

 

ends_with() function in R:

In order to drop the column which ends with certain label we will be using select() function along with ends_with() function by passing the column label inside the ends_with() function as shown below.

library(dplyr)
mydata <- mtcars

# Drop column names of the dataframe which ends with
select(mydata,-ends_with("cyl"))

Dropping the column name which ends with “cyl” is accomplished using ends_with() function and select() function.

Drop variables (columns) in R using Dplyr 4

 

contains() function in R:

In order to drop the column which contains with certain label we will be using select() function along with contains() function by passing the text inside the contains() function as shown below.

library(dplyr)
mydata <- mtcars

# drop the column names of the dataframe which contains
select(mydata,-contains("s"))

Dropping the column name which contains “s” is accomplished using contains() function and select() function.

Drop variables (columns) in R using Dplyr 5

 

matches() function:

Drop the column name which matches with “di”. In order to drop the column which matches with certain pattern we will be using select() function along with matches() function by passing the text or pattern inside the matches() function as shown below.

library(dplyr)
mydata <- mtcars

# Drop the columns names of the dataframe which matches
select(mydata,-matches("di"))

Dropping the column name which matches “di” is accomplished using matches() function and select() function.

Drop variables (columns) in R using Dplyr 6

 

Drop Column names using Regular Expression in R Regex:

Drop the column name which matches with certain pattern using regular expression has been accomplished with the help of grepl() function. grepl() function pass the column name and regular expression as argument and returns the matched column as shown below.

mydata = mtcars

# Drop the column names using Regular Expression
mydata1 = mydata[,!grepl("^c",names(mydata))]
mydata1

Dropping the column name which starts with “c” is accomplished using grepl() function along with regular expression.

Drop column in R with dplyr 4


Drop columns with missing values  in R:

Drop column in R with dplyr 8

 

In order depict an example on dropping a column with missing values, First lets create the dataframe as shown below.

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                       ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                       Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,60),
                       Tax = c(2,4,5,NA,2,3,NA,1,NA,4,5,NA,4,NA))
my_basket

so the dataframe will be

Drop column in R with dplyr 5

sapply function is an alternative of for loop. which built-in or user-defined function on each column of data frame. sapply(df, function(x) mean(is.na(x))) returns percentage of missing values in each column of a dataframe.

###### drop columns on a missing value

my_basket = my_basket[,!sapply(my_basket, function(x) mean(is.na(x)))>0.3]
my_basket

The above program removed column “Tax” as it contains more than 30% missing values as we have given our threshold as 30%. so the final output dataframe will be without Tax column

Drop column in R with dplyr 5

 

for further understanding of dropping a column with dplyr package one can refer documentation.


Other Related Topics:

Drop variables (columns) in R using Dplyr prev                                                                                                           Drop variables (columns) in R using Dplyr next

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.