Typecast string to date and date to string in Pyspark

In order to typecast string to date in pyspark we will be using to_date() function with column name and date format as argument, To typecast date to string in pyspark we will be using cast() function with StringType() as argument. Let’s see an example of type conversion or casting of string column to date column and date column to string column in pyspark.

  • Type cast string column to date column in pyspark using cast() function
  • Type cast date column to string column in pyspark

We will be using the dataframe named df_student

Typecast string to date and date to string in Pyspark 1

 

 

Typecast string column to date column in pyspark:

First let’s get the datatype of “birthday” column as shown below

### Get datatype of birthday column

df_student.select("birthday").dtypes

so the resultant data type of birthday column is string

Typecast string to date and date to string in Pyspark 2

Now let’s convert the birthday column to date using to_date() function with column name and date format  passed as arguments, which converts the string column to date column in pyspark and it is stored  as a dataframe named output_df

########## Type cast string column to date column in pyspark

from pyspark.sql.functions import to_date
df1 = df_student.withColumn('birthday',to_date(df_student.birthday, 'dd-MM-yyyy'))

 Now let’s get the datatype of birthday column as shown below

### Get datatype of birthday

output_df.select("birthday").dtypes

so the resultant data type of birthday column is date

Typecast string to date and date to string in Pyspark 3

 

 

 

Type cast date column to string column in pyspark:

First let’s get the datatype of birthday column from output_df as shown below

### Get datatype of birthday column

output_df.select("birthday").dtypes

so the resultant data type of birthday column is date

Typecast string to date and date to string in Pyspark 4

Now let’s convert the birthday column to string using cast() function with StringType() passed as an argument which converts the  date column to string column in pyspark and it is stored  as a dataframe named output_df

########## Type cast date column to string column in pyspark

from pyspark.sql.types import StringType
output_df = df_student.withColumn("birthday",df_student["birthday"].cast(StringType()))

Now let’s get the datatype of birthday column as shown below

### Get datatype of birthday column

output_df.select("birthday").dtypes

So the resultant data type of birthday column is string

Typecast string to date and date to string in Pyspark 5

 

 


Other Related Topics :

 

Typecast string to date and date to string in Pyspark                                                                                              Typecast string to date and date to string in Pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.