Get Substring of the column in Pyspark

In order to get substring of the column in pyspark we will be using substr() Function. We look at an example on how to get substring of the column in pyspark.

  • Get substring of the column in pyspark using substring function.
  • Get Substring from end of the column in pyspark.

 

Syntax:

 df.colname.substr(start,length)

df- dataframe
colname- column name
start – starting position
length – number of string from starting position

We will be using the dataframe named df_states

Get Substring of the column in Pyspark 1

 

 

Substring of the column in pyspark

df.colname.substr() gets the substring of the column in pyspark

### Get Substring of the column in pyspark

df = df_states.withColumn("substring_statename", df_states.state_name.substr(1,6))
df.show()

substr(1,6) returns the first 6 characters from column “state_name”

Get Substring of the column in Pyspark 2

 

 

Get Substring from end of the column in pyspark

df.colname.substr() gets the substring of the column in pyspark . In order to get substring from end we will specifying first parameter with minus(-) sign.

### Get Substring from end of the column in pyspark

df = df_states.withColumn("substring_from_end", df_states.state_name.substr(-2,2))
df.show()

In our example we will extract substring from end. i.e. last two character of the column. We will specifying first parameter with minus(-) sign, Followed by length as second parameter so the resultant table will be

Get Substring of the column in Pyspark                                                                                            Get Substring of the column in Pyspark