Join in R: How to join (merge) data frames (inner, outer, left, right) in R

We can merge two data frames in R by using the merge() function or by using family of  join() function in dplyr package. The data frames must have same column names on which the merging happens. Merge() Function in R is similar to database join operation in SQL. The different arguments to merge() allow you to perform natural joins i.e. inner join, left join, right join,cross join, semi join, anti join and full outer join. We can perform Join in R using merge() Function or by using family of join() functions in dplyr package.

We will have look at an example of

  • Inner join using merge() function in R or inner_join() function of dplyr with example
  • Outer join using merge() function or full_join() function of dplyr with example
  • Left join using left_join() function of dplyr or merge() function
  • Right join using right_join() function of dplyr or merge() function.
  • Cross join with merge() function
  • semi join and anti join in R using semi_join() function and anti_join() function.

 

Syntax of merge() function in R

merge(x, y, by.x, by.y,all.x,all.y, sort = TRUE)
  • x:data frame1.
  • y:data frame2.
  • by,x, by.y: The names of the columns that are common to both x and y. The default is to use the columns with common names between the two data frames.
  • all, all.x, all.y:Logical values that specify the type of merge. The default value is all=FALSE (meaning that only the matching rows are returned).

 

UNDERSTANDING THE DIFFERENT TYPES OF MERGE IN R:

  • Natural join or Inner Join: To keep only rows that match from the data frames, specify the argument all=FALSE.
  • Full outer join or Outer Join:To keep all rows from both data frames, specify all=TRUE.
  • Left outer join or Left Join:To include all the rows of your data frame x and only those from y that match, specify x=TRUE.
  • Right outer join or Right Join:To include all the rows of your data frame y and only those from x that match, specify y=TRUE.

Merge Function R Join in R

Lets look at with some examples

# data frame 1
df1 = data.frame(CustomerId = c(1:6), Product = c("Oven","Television","Mobile","WashingMachine","Lightings","Ipad"))
df1 

# data frame 2
df2 = data.frame(CustomerId = c(2, 4, 6, 7, 8), State = c("California","Newyork","Santiago","Texas","Indiana")) 
df2 

so we will get following two data frames

df1 will be
Join in R Merge data frames inner outer right left join in R8a

df2 will be
Join in R Merge data frames inner outer right left join in R9a

 

 

INNER JOIN Explained

Inner Join in R is the simplest and most common type of join. It is also known as simple join or Natural Join. Inner join returns the rows when matching condition is met.

 

Join in R Merge data frames inner outer right left join in R3a

Inner join in R using merge() function:  merge() function takes df1 and df2 as argument. merge() function by default performs inner join there by return only the rows in which the left table have matching keys in the right table.

#### Left Join using merge function
df = merge(x=df1,y=df2,by="CustomerId")
df

the resultant inner joined dataframe df will be

Join in R Merge data frames inner outer right left join in R10a

Inner join in R using inner_join() function of dplyr: 

dplyr() package has inner_join() function which performs inner join of two dataframes by “CustomerId” as shown below.

#### Left Join using inner_join function 
library(dplyr)
df= df1 %>% inner_join(df2,by="CustomerId")
df

the resultant inner joined dataframe df will be

Join in R Merge data frames inner outer right left join in R10a

 

 

OUTER JOIN Explained

Outer Join in  R combines the results of both left and right outer joins. The joined table will contain all records from both the tables

Join in R Merge data frames inner outer right left join in R4a

Outer join in R using merge() functionmerge() function takes df1 and df2 as argument along with all=TRUE there by returns all rows from both tables, join records from the left which have matching keys in the right table.

###### outer join in R using merge() function
df = merge(x=df1,y=df2,by="CustomerId",all=TRUE)
df

the resultant data frame df will be

Join in R Merge data frames inner outer right left join in 14

outer join in R using full_join() function of dplyr: 

dplyr() package has full_join() function which performs outer join of two dataframes by “CustomerId” as shown below.

###### outer join in R using outer_join() function 
library(dplyr)
df= df1 %>% full_join(df2,by="CustomerId")
df

the resultant outer joined dataframe df will be

Join in R Merge data frames inner outer right left join in 14

 

 

LEFT JOIN Explained:

The LEFT JOIN in R returns all records from the left dataframe (A), and the matched records from the right dataframe (B)

Join in R Merge data frames inner outer right left join in R5a

Left  join in R:  merge() function takes df1 and df2 as argument along with all.x=TRUE  there by returns all rows from the left table, and any rows with matching keys from the right table.

###### left join in R using merge() function 
df = merge(x=df1,y=df2,by="CustomerId",all.x=TRUE)
df

the resultant data frame df will be
Join in R Merge data frames inner outer right left join in R15

Left join in R using left_join() function of dplyr:

dplyr() package has left_join() function which performs left join of two dataframes by “CustomerId” as shown below.

###### left join in R using left_join() function 
library(dplyr)
df= df1 %>% left_join(df2,by="CustomerId")
df

the resultant Left joined dataframe df will be

Join in R Merge data frames inner outer right left join in R15

 

 

RIGHT JOIN Explained:

The RIGHT JOIN in R returns all records from the right dataframe (B), and the matched records from the left dataframe (A)

Join in R Merge data frames inner outer right left join in R6a

Right join in R: merge() function takes df1 and df2 as argument along with all.y=TRUE  and thereby returns all rows from the right table, and any rows with matching keys from the left table.

###### right join in R using merge() function 
df = merge(x=df1,y=df2,by="CustomerId",all.y=TRUE)
df

the resultant data frame df will be

Join in R Merge data frames inner outer right left join in R16

Right join in R using right_join() function of dplyr: 

dplyr() package has right_join() function which performs outer join of two dataframes by “CustomerId” as shown below.

###### right join in R using merge() function 

library(dplyr)

df= df1 %>% right_join(df2,by="CustomerId")
df

the resultant right joined dataframe df will be
Join in R Merge data frames inner outer right left join in R16

 

 

 

Cross join in R: A Cross Join (also sometimes known as a Cartesian Join) results in every row of one table being joined to every row of another table

##### cross join in R

df = merge(x = df1, y = df2, by = NULL)
df

the resultant data frame df will be

Join in R Merge data frames inner outer right left join in R17

 

 

SEMI JOIN in R using dplyr:

This is like inner join, with only the left dataframe columns and values are selected

Join in R Merge data frames inner outer right left join in R7a

#### Semi join in R

library(dplyr)

df= df1 %>% semi_join(df2,by="CustomerId")
df

the resultant data frame df will be

Join in R Merge data frames inner outer right left join in R18

 

ANTI JOIN in R using dplyr:

This join is like df1-df2, as it selects all rows from df1 that are not present in df2.

#### anti join in R

library(dplyr)
df= df1 %>% anti_join(df2,by="CustomerId")

df

the resultant data frame df will be

Join in R Merge data frames inner outer right left join in R19

 

For further understanding of join() function in R using dplyr one can refer the dplyr documentation


Related Topics:

 

previous small r merge function join in r                                                                                                          next small r merge function join in r

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.