Case when in R can be executed with case_when() function in dplyr package. Dplyr package is provided with case_when() function which is similar to case when statement in SQL. we will be looking at following examples on case_when() function.
- create new variable using Case when statement along with mutate() function
- Handling NA using Case when statement
- Case when statement of a vector in R.
Lets use the my_basket data to depict the example of case_when() function. Lets First create the dataframe.
my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"), Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,NA), Tax = c(2,4,5,6,2,3,5,1,3,4,5,6,4,NA)) my_basket
my_basket dataframe will be
Create new variable using case when statement in R:
We will be creating additional variable Price_band using mutate function and case when statement. Price_band consist of “Medium”,”High” and “Low” based on price value.
### Case_when() to create new variable my_basket %>% mutate(Price_band = case_when(Price>=50 & Price <=70 ~ "Medium", Price > 70 ~ "High", TRUE ~ "Low"))
- you can use variables directly within case_when() wrapper.
- TRUE equivalent to ELSE statement .
So the resultant data frame will be
Handling NA using Case when statement:
We will be creating additional variable Price_band using mutate function and case when statement. we will also be handling NA values using is.na() inside case when
### Case_when() to create new variable with NA my_basket %>% mutate(Price_band = case_when(is.na(Price) ~ "unknown", Price>=50 & Price <=70 ~ "Medium", Price > 70 ~ "High", TRUE ~ "Low"))
Similar to previous example, But we have handled NA here using is.na() function. whenever there is NA present in the Price column we will be assigning the Price_band to “unknown”. So the resultant data frame will be
NOTE: Make sure you set is.na() condition at the beginning of R case_when to handle the missing values.
Using Switch Statement in R:
sapply() along with switch statement is used to create a new variable “Type” which is used to categorize the Item_group
#### using Switch statement in R my_basket$Type <- sapply(my_basket$ITEM_GROUP, switch, Fruit = 'Groceries', Vegetable = 'Groceries', Dairy = 'MilkProducts') my_basket
so the resultant dataframe will be
Using case_when in Vector:
we can look at example on applying case when statement in a vector as shown below. First lets create a vector
#### Create vector in R x <- 1:20 x
so the resultant vector is
we will be using case when to replace the multiple of 5 with 5X. multiple of 7 with 7X and multiple of 9 with 9X as shown below
#### Case_when in R case_when( x %% 5 == 0 ~ "5X", x %% 7 == 0 ~ "7X", x %% 9 == 0 ~ "9X", TRUE ~ as.character(x) )
so the resultant vector will be
for further understanding of case_when statement in R one can refer documentation