Remove Leading, Trailing and all space of column in pyspark – strip & trim space

In order to remove leading, trailing and all space of column in pyspark, we use ltrim(), rtrim() and trim() function. Strip leading and trailing space in pyspark is accomplished using ltrim() and rtrim() function respectively. In order to trim both the leading and trailing space in pyspark we will using trim() function. Let’s see how to

  • Remove Leading space of column in pyspark with ltrim() function – strip or trim leading space
  • Remove Trailing space of column in pyspark with rtrim() function – strip or trim trailing space
  • Remove both leading and trailing space of column in postgresql with trim() function – strip or trim both leading and trailing space
  • Remove all the space of column in postgresql

Remove Leading, Trailing and all space of column in pyspark – strip & trim space c1

We will be using df_states table.

Remove Leading, Trailing and all space of column in pyspark 1

 

 

Remove Leading space of column in pyspark with ltrim() function – strip or trim leading space

To Remove leading space of the column in pyspark we use ltrim() function. ltrim() Function takes column name and trims the left white space from that column.

### Remove leading space of the column in pyspark
from pyspark.sql.functions import *

df_states = df_states.withColumn('states_Name', ltrim(df_states.state_name))
df_states.show(truncate =False)

so the resultant table with leading space removed will beRemove Leading, Trailing and all space of column in pyspark 2

 

 

 

Remove Trailing space of column in pyspark with rtrim() function – strip or trim trailing space

To Remove Trailing space of the column in pyspark we use rtrim() function. rtrim() Function takes column name and trims the right white space from that column.

### Remove trailing space of the column in pyspark
from pyspark.sql.functions import *

df_states = df_states.withColumn('states_Name', rtrim(df_states.state_name))
df_states.show(truncate =False)

So the resultant table with trailing space removed will be
Remove Leading, Trailing and all space of column in pyspark 3

 

 

 

Remove both leading and trailing space of column in pyspark with trim() function – strip or trim space

To Remove both leading and trailing space of the column in pyspark we use trim() function. trim() Function takes column name and trims both left and right white space from that column.

### Remove leading and trailing space of the column in pyspark
from pyspark.sql.functions import *

df_states = df_states.withColumn('states_Name', trim(df_states.state_name))
df_states.show(truncate =False)

So the resultant table with both leading space and trailing spaces removed will be
Remove Leading, Trailing and all space of column in pyspark 4

 

 

 

Remove all the space of column in pyspark with trim() function – strip or trim space

To Remove all the space of the column in pyspark we use regexp_replace() function. Which takes up column name as argument and removes all the spaces of that column through regular expression

### Remove all the space of the column in pyspark
from pyspark.sql.functions import regexp_replace, col

df_states = df_states.withColumn('states_Name', regexp_replace(col("state_name"), " ", ""))
df_states.show(truncate =False)

So the resultant table with all the spaces removed will be
Remove Leading, Trailing and all space of column in pyspark 5

 


Other Related Topics:

 

Remove Leading, Trailing and all space of column in pyspark – strip & trim space                                                                                                 Remove Leading, Trailing and all space of column in pyspark – strip & trim space

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.