String split of the column in pyspark

In order to split the strings of the column in pyspark we will be using split() function. split function takes the column name and delimiter as arguments. Let’s see with an example on how to split the string of the column in pyspark.

  • String split of the column in pyspark with an example.

We will be using the dataframe df_student_detail.

String split of the columns in pyspark 1

 

 

String Split of the column in pyspark : Method 1

  • split() Function in pyspark takes the column name as first argument ,followed by delimiter (“-”) as second argument. Which splits the column by the mentioned delimiter (“-”).
  • getItem(0) gets the first part of split . getItem(1) gets the second part of split
### String Split of the column in pyspark
from pyspark.sql.functions import split

df_states.withColumn("col1", split(col("State_Name"), "-").getItem(0)).withColumn("col2", split(col("State_Name"), "-").getItem(1)).show()

so the resultant dataframe will be

String split of the columns in pyspark 2

 

 

 

String Split of the column in pyspark : Method 2

split() Function in pyspark takes the column name as first argument ,followed by delimiter (“-”) as second argument.  With rdd flatMap() the first set of values becomes col1 and second set after delimiter becomes col2

### String Split of the column in pyspark
import pyspark.sql.functions as f

df_split = df_states.select(f.split(df_states.state_name,"-")).rdd.flatMap(lambda x: x).toDF(schema=["col1","col2"])
df_split.show()

so the splitted column will be

String split of the columns in pyspark 3

 


Other Related Topics:

 

String split of the columns in pyspark                                                                                             String split of the columns in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.