Cumulative sum of a column in R

Cumulative sum of the column in R can be accomplished by using cumsum function. We can also calculate the cumulative sum of the column with the help of dplyr package in R. Cumulative sum of the column by group (within group) can also computed with group_by() function along with cumsum() function along with conditional cumulative sum which handles NA.  In this section we will see how to

  • Get cumulative sum of column in R
  • Get cumulative sum of the column by group (within group) using group_by() and cumsum() function
  • Get Cumulative sum of the column with dplyr package.
  • Reverse cumulative sum of column
  • Cumulative sum based on a condition i.e. conditional cumulative sum in R – cumsum() handling NA

Cumulative sum of the column in R cumsum() dplyr11

Let’s first create the dataframe

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                       ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                       Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,120))
my_basket

dataframe my_basket will be

Cumulative sum of the column in R 11

 

Get cumulative sum of column in R:

Cumulative sum of a column is calculated using cumsum() function. cumsum() function takes column name as argument and calculates the cumulative sum of that argument as shown below

# cumulative sum

my_basket$cumulative_price = cumsum(my_basket$Price) 
my_basket

resultant dataframe will be

Cumulative sum of the column in R 12

 

 

Cumulative sum of the column in dplyr package :

Cumulative sum of a column is calculated using cumsum() function. cumsum() function along with dplyr pipe operator (%>%) takes column name as argument and calculates the cumulative sum of that argument as shown below

### Cumulative sum of the column by dplyr package

library(dplyr)

my_basket %>%
  mutate(cumulative_price = cumsum(Price))

so the cumulative sum of the price column is calculated as shown below

Cumulative sum of the column in R 12

 

Cumulative sum of the column by group (within group) in dplyr package:

Cumulative sum of the column in R cumsum() dplyr 12

Cumulative sum of a column by group is calculated using cumsum() function and group_by() function of the dplyr package. cumsum() function takes column name as argument and calculates the cumulative sum  of that argument. group_by() function takes the column to be grouped as shown below

 
### Cumulative sum of the column by group using dplyr package 

library(dplyr)

my_basket %>%
  group_by(ITEM_GROUP) %>%
  mutate(cum_price_by_item_grp = cumsum(Price))

 

so the cumulative sum of the “Price” column by “ITEM_GROUP”  is calculated as shown below

Cumulative sum of the column in R 13

 

 

Get Reverse cumulative sum of column:

Reverse cumulative sum of a column is calculated using rev() and cumsum() function. cumsum() function takes up column name as argument which computes the cumulative sum of the column and it is passed to rev() function which reverses the cumulative sum  as shown below.

# reverse cumulative sum

my_basket$rev_cumulative_price = rev(cumsum(my_basket$Price))
my_basket

resultant dataframe with reverse cumulative sum  calculated will be

Cumulative sum of the column in R 15

 


Conditional Cumulative sum – cumsum() with NA handling in R:

Let’s first create the dataframe

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                       ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                       Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,60),
                       Tax = c(2,4,5,NA,2,3,NA,1,NA,4,5,NA,4,NA))
my_basket

dataframe my_basket will be

Cumulative sum of the column in R cumsum() dplyr 13

 

Handling NA on cumulative sum:

In our example we have used mutate function to create new variable and NA’s are replaced with 0 and cumulative sum  of the Tax column is calculated.

### cumulative sum and handling NA values
my_basket %>% mutate( NEW_TAX = ifelse( is.na(Tax), 0, Tax ), #remove NA
        Cum_Sum_TAX = cumsum(NEW_TAX) )

so the resultant dataframe with cumulative sum of the Tax column calculated by replacing NA’s with 0

Cumulative sum of the column in R cumsum() dplyr 14

for further understanding of cumulative sum of the column using dplyr in R one can refer documentation.


Other Related Topics:

 

                                                                                                           

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.