Left and Right pad of column in pyspark –lpad() & rpad()

In order to add padding to the left side of the column we use left pad of column in pyspark, left padding is accomplished using lpad() function. In order to add padding to the right side of the column we use right pad of column in pyspark, right padding is accomplished using rpad() function. Let’s see how to

  • Left pad of the column in pyspark – lpad()
  • Right pad of the column in pyspark – rpad()
  • Add both left and right padding in pyspark

Left and Right pad of column in pyspark –lpad() & rpad() c1

We will be using dataframe df_states
Left and Right pad of column in pyspark –lpad() & rpad() 1

 

 

Add left pad of the column in pyspark

Padding is accomplished using lpad() function. lpad() Function takes column name ,length and padding string as arguments. In our case we are using state_name column and “#” as padding string so the left padding is done till the column reaches 14 characters.

### Add Left pad of the column in pyspark
from pyspark.sql.functions import *

df_states = df_states.withColumn('states_Name_new', lpad(df_states.state_name,14, '#'))
df_states.show(truncate =False)

So the resultant left padding string and dataframe will be
Left and Right pad of column in pyspark –lpad() & rpad() 2

 

 

Add Right pad of the column in pyspark

Padding is accomplished using rpad() function. rpad() Function takes column name ,length and padding string as arguments. In our case we are using state_name column and “#” as padding string so the right padding is done till the column reaches 14 characters.

### Add Right pad of the column in pyspark
from pyspark.sql.functions import *

df_states = df_states.withColumn('states_Name_new', rpad(df_states.state_name,14, '#'))
df_states.show(truncate =False)

So the resultant right padding string and dataframe will be
Left and Right pad of column in pyspark –lpad() & rpad() 3

 

Add Both Left and Right pad of the column in pyspark

Adding both left and right Pad is accomplished using lpad() and rpad() function. lpad() Function takes column name, length and padding string as arguments. Then again the same is repeated for rpad() function. In our case we are using state_name column and “#” as padding string so the left padding is done till the column reached 20 characters followed by right padding till the column reaches 24 characters.

#### Add both leading and Trailing space

df_states = df_states.withColumn('states_Name_new', lpad(df_states.state_name,20, '#'))
df_states = df_states.withColumn('states_Name_new', rpad(df_states.states_Name_new,24, '#'))

df_states.show(truncate =False)

So the resultant left and right padding of the column will be

Left and Right pad of column in pyspark –lpad() & rpad() d1

 


Other Related Topics:

Left and Right pad of column in pyspark –lpad() & rpad()                                                                               Left and Right pad of column in pyspark –lpad() & rpad()

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.