Repeat the column in Pyspark

In order to repeat the column in pyspark we will be using repeat() Function. We look at an example on how to  repeat the string of the column in pyspark.

  • Repeat the string of the column in pyspark using repeat() function.

Syntax:

 repeat(colname,n)

colname – Column name.
n –  number of times repeat

We will be using the dataframe named df

Repeat the column in Pyspark 1

 

Repeat() function in pyspark:

repeat(str, n) – Returns the string which repeats the given string value n times.

Examples:

> SELECT repeat('123', 3);

Output :

123123123

 

Repeat the column in Pyspark

repeat() function takes up column name and number of times as argument. In our example name column is taken as input and it is repeated twice as 2 is passed as argument and then the result is stored int the column name “new_column”

### Repeat the column in pyspark

from pyspark.sql.functions import repeat, expr

df.withColumn("new_column",(expr("repeat(name, 2)"))).show()

The resultant dataframe with column “name”  repeated twice will be.

Repeat the column in Pyspark 2

 


Other Related Topics:

 

Repeat the column in Pyspark                                                                                     Repeat the column in Pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.