Join in pyspark (Merge) inner, outer, right, left join

We can merge or join two data frames in pyspark by using the join() function. The different arguments to join() allows you to perform left join, right join, full outer join and natural join or inner join in pyspark.  Join in pyspark (Merge) inner, outer, right, left join in pyspark is explained below

  • Inner join in pyspark with example with join() function
  • Outer join in pyspark with example
  • Left join in pyspark with example
  • Right join in pyspark with example
  • Left semi join in pyspark with example
  • Left anti join in pyspark with example
  • Full join in pyspark with example
  • Anti join in pyspark with example

Syntax :

df1.join(df2, on=[‘Roll_No’], how=’left’)
  • df1− Dataframe1.
  • df2– Dataframe2.
  • on− Columns (names) to join on. Must be found in both df1 and df2.
  • how– type of join needs to be performed – ‘left’, ‘right’, ‘outer’, ‘inner’, Default is inner join

We will be using dataframes df1 and df2:

df1:

join in pyspark (Merge) inner , outer, right , left join in pyspark 6

df2:

join in pyspark (Merge) inner , outer, right , left join in pyspark 7

 

 

Inner join in pyspark with example

Inner Join in pyspark is the simplest and most common type of join. It is also known as simple join or Natural Join. Inner join returns the rows when matching condition is met.

join in pyspark (Merge) inner , outer, right , left join in pyspark 3

### Inner join in pyspark

df_inner = df1.join(df2, on=['Roll_No'], how='inner')
df_inner.show()

inner join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 8

 

 

Outer join in pyspark with example

outer Join in pyspark combines the results of both left and right outer joins. The joined table will contain all records from both the tables

join in pyspark (Merge) inner , outer, right , left join in pyspark 4

### Outer join in pyspark

df_outer = df1.join(df2, on=['Roll_No'], how='outer')
df_outer.show()

outer join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 9

 

 

Left join in pyspark with example

The LEFT JOIN in pyspark returns all records from the left dataframe (A), and the matched records from the right dataframe (B)

join in pyspark (Merge) inner , outer, right , left join in pyspark 1

### Left join in pyspark

df_left = df1.join(df2, on=['Roll_No'], how='left')
df_left.show()

left join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 10

 

 

Right join in pyspark with example

The RIGHT JOIN in pyspark returns all records from the right dataframe (B), and the matched records from the left dataframe (A)

join in pyspark (Merge) inner , outer, right , left join in pyspark 2

### Right join in pyspark

df_right = df1.join(df2, on=['Roll_No'], how='right')
df_right.show()

Right join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 11

 

 

Left Anti join in pyspark with example

This join is like df1-df2, as it selects all rows from df1 that are not present in df2.

join in pyspark (Merge) inner , outer, right , left join in pyspark 5

### Left Anti join in pyspark

df_left_anti = df1.join(df2, on=['Roll_No'], how='left_anti')
df_left_anti.show()

Left Anti join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 12

 

 

Left Semi join in pyspark with example

This is like inner join, with only the left dataframe columns and values are selected

join in pyspark (Merge) inner , outer, right , left join in pyspark 5

### Left Semi join in pyspark

df_left_semi = df1.join(df2, on=['Roll_No'], how='left_semi')
df_left_semi.show()

Left Semi Join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 13

 

 

Full join in pyspark:

Full Join in pyspark combines the results of both left and right outer joins. The joined table will contain all records from both the tables

join in pyspark (Merge) inner , outer, right , left join in pyspark 4

, how='full')
df_full.show()

full join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 14

 

 

Anti join in pyspark:

Anti join in pyspark returns rows from the first table where no matches are found in the second table

join in pyspark (Merge) inner , outer, right , left join in pyspark 16

### Anti join in pyspark

df_anti = df1.join(df2, on=['Roll_No'], how='anti')
df_anti.show()

Anti join will be
join in pyspark (Merge) inner , outer, right , left join in pyspark 15

 


Other Related Topics:

 

Join in pyspark (Merge) inner , outer, right , left join in pyspark                                                                                       Join in pyspark (Merge) inner , outer, right , left join in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.