Get String length of column in Pyspark

In order to get string length of column in pyspark we will be using length() Function. We look at an example on how to get string length of the specific column in pyspark. we will also look at an example on filter using the length of the column.

  • Get string length of the column in pyspark using length() function.
  • Filter the dataframe using length of the column in pyspark

Syntax:

 length(“colname”)

colname –  column name

We will be using the dataframe named df_books

Get String length in Pyspark 1

 

 

Get String length of column in Pyspark:

In order to get string length of the column we will be using length() function. which takes up the column name as argument and returns length


### Get String length of the column in pyspark

import pyspark.sql.functions as F

df = df_books.withColumn("length_of_book_name", F.length("book_name"))
df.show(truncate=False)

So the resultant dataframe with length of the column appended to the dataframe will be

Get String length in Pyspark 2

 

 

Filter the dataframe using length of the column in pyspark:

Filtering the dataframe based on the length of the column is accomplished using length() function.  we will be filtering the rows only if the column “book_name” has greater than or equal to 20 characters.


### Filter using length of the column in pyspark

from pyspark.sql.functions import length
df_books.where(length(col("book_name")) >= 20).show()

So the resultant dataframe which is  filtered  based on the length of the column will be

Get string length of the column in pyspark d1

 


Other Related Topics:

Get String length of column in Pyspark                                                                                               Get String length of column in Pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.