Set difference of dataframes in R

set difference of dataframes in R is computed using functions like setdiff() and anti_join(). In this tutorial we will be looking on how to compute set difference of two dataframes with an example

Let’s first create two dataframes.

# create dataframe 1
df1 =data.frame(State=c('Arizona','Georgia', 'Newyork','Indiana','seattle','washington','Texas'),
Score=c(62,47,55,74,31,77,85))
df1

df1

Set difference of dataframes in R 1

df2=data.frame(State=c('Arizona','Georgia','California','Florida'),Score=c(62,47,85,12))
df2

df2:

Set difference of dataframes in R 2

 

Set difference of two dataframes – (Method 1)

Set difference of two dataframes using setdiff() function of dplyr package

#method 1

library (dplyr)
setdiff(df1,df2) 

So the set difference will be

Set difference of dataframes in R 3

 

Set difference of two dataframes – (Method 2)

Set difference of two dataframes using anti_join() function.

#method 2
anti_join(df1,df2)

So the set difference will be

Set difference of dataframes in R 4

                                                                                                     

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.