Case when in R using case_when() Dplyr – case_when in R

Case when in R can be executed with case_when() function in dplyr package. Dplyr package is provided with case_when() function which is similar to case when statement in SQL.  case when with multiple conditions in R and switch statement.  we will be looking at following examples on case_when() function.

  • create new variable using Case when statement in R along with mutate() function
  • Handling NA using Case when statement
  • Case when statement of a vector in R.
  • Switch case in R using switch statement.

Dplyr case_when in R

Lets use the my_basket data to depict the example of case_when() function. Lets First create the dataframe.

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                       ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                       Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,NA),
                       Tax = c(2,4,5,6,2,3,5,1,3,4,5,6,4,NA))
my_basket

my_basket dataframe will be

case_when statement in R 10

 

 

Create new variable using case when statement in R: Case when with multiple condition

We will be creating additional variable Price_band using mutate function and case when statement. Price_band consist of “Medium”,”High” and “Low” based on price value. so the new variables are created using multiple conditions in the case_when() function of R.

### Case_when() to create new variable

my_basket %>% mutate(Price_band = case_when(Price>=50 & Price <=70   ~ "Medium", Price > 70 ~ "High", TRUE ~ "Low"))
  • you can use variables directly within case_when() wrapper.
  • TRUE equivalent to ELSE statement .

So the resultant data frame will be

case_when statement in R 11

 

Handling NA using Case when statement:

We will be creating additional variable Price_band using mutate function and case when statement. we will also be handling NA values using is.na() inside case when

### Case_when() to create new variable with NA
my_basket %>% mutate(Price_band = case_when(is.na(Price) ~ "unknown", Price>=50 & Price <=70   ~ "Medium", Price > 70 ~ "High", TRUE ~ "Low"))

Similar to previous example, But we have handled NA here using is.na() function. whenever there is NA present in the Price column we will be assigning the Price_band to “unknown”. So the resultant data frame will be

case_when statement in R 11

NOTE:  Make sure you set is.na() condition at the beginning of R case_when to handle the missing values.

 

Using Switch Statement in R:

sapply() along with switch statement is used to create a new variable “Type” which is used to categorize the Item_group

#### using Switch statement in R
my_basket$Type <- sapply(my_basket$ITEM_GROUP, switch, 
                  Fruit = 'Groceries', 
                  Vegetable = 'Groceries', 
                  Dairy = 'MilkProducts')
my_basket

so the resultant dataframe will be

case_when statement in R 12

 


Using case_when in Vector:

we can look at example on applying case when statement in a vector as shown below. First lets create a vector

#### Create vector in R

x <- 1:20
x

so the resultant vector is

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

we will be using case when to replace the multiple of 5 with 5X. multiple of 7 with 7X and multiple of 9 with 9X as shown below

#### Case_when in R

case_when(
  x %% 5 == 0 ~ "5X",
  x %% 7 == 0 ~ "7X",
  x %% 9 == 0 ~ "9X",
  TRUE ~ as.character(x)
)

so the resultant vector will be

case_when statement in R 14

for further understanding of case_when statement in R one can refer documentation


Other Related Topics:

 

Case when statement in R using case_when() Dplyr                                                                                                           Case when statement in R using case_when() Dplyr

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.