Add leading zeros to the column in pyspark

In order to add leading zeros to the column in pyspark we will be using concat() function. There are some other ways to add preceding zeros to the column in pyspark using format_string() function. Let’s see an example for each method

  • Add leading or preceding zeros to the column in pyspark using concat() function
  • Add lead zeros in to the column in pyspark using format_string() function
  • Add preceding zeros to the column in pyspark using lpad() function

Add leading zeros to the column in pyspark c1

We will be using dataframe df_student_detail

Add leading zeros to the column in pyspark 1

 

 

 

 

 

Add leading zeros to the column in pyspark using concat() function  – Method 1

We will be Using lit() and concat() function to add the leading zeros to the column in pyspark. lit() function takes up ‘00’ and concatenate with ‘grad_score’ column there by adding leading zeros to the column

### Add leading zeros to the column in pyspark -1

from pyspark.sql import functions as sf
df_student_detail.withColumn('joined_column',sf.concat(sf.lit('00'), sf.col('grad_score'))).show()

So the column with leading zeros added will be

Add leading zeros to the column in pyspark 2

 

 

 

Add preceding zeros to the column in pyspark using format_string() function – Method 2

format_string() function takes up “%03d” and column name “grad_score” as argument. Which adds leading zeros to the “grad_score” column till the string length becomes 3.

### Add leading zeros to the column in pyspark - 2

from pyspark.sql import functions as sf
df_student_detail.withColumn("joined_column", sf.format_string("%03d","grad_score")).show()

So the column with leading zeros added will be

Add leading zeros to the column in pyspark 3

 

 

 

Add preceding zeros to the column in pyspark using lpad() function – Method 3

lpad() function takes up “grad_score” as argument followed by 3 i.e. total string length followed by “0” which will be padded to left of the “grad_score” . Which adds leading zeros to the “grad_score” column till the string length becomes 3.

### Add leading zeros to the column in pyspark - 3

from pyspark.sql.functions import *
df_student_detail = df_student_detail.withColumn('grad_score_new', lpad(df_student_detail.grad_score,3, '0'))
df_student_detail.show()

So the column with leading zeros added will be
Add leading zeros to the column in pyspark 4

for more details you can refer this

 

 


OTHER RELATED TOPICS

 

Add leading zeros to the column in pyspark                                                                                            Add leading zeros to the column in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.