In order to repeat the column in pyspark we will be using repeat() Function. We look at an example on how to repeat the string of the column in pyspark.
- Repeat the string of the column in pyspark using repeat() function.
colname – Column name.
n – number of times repeat
We will be using the dataframe named df
Repeat() function in pyspark:
repeat(str, n) – Returns the string which repeats the given string value n times.
> SELECT repeat('123', 3);
Repeat the column in Pyspark
repeat() function takes up column name and number of times as argument. In our example name column is taken as input and it is repeated twice as 2 is passed as argument and then the result is stored int the column name “new_column”
### Repeat the column in pyspark from pyspark.sql.functions import repeat, expr df.withColumn("new_column",(expr("repeat(name, 2)"))).show()
The resultant dataframe with column “name” repeated twice will be.
Other Related Topics:
- Typecast Integer to Decimal and Integer to float in Pyspark
- Get number of rows and number of columns of dataframe in pyspark
- Extract Top N rows in pyspark – First N rows
- Absolute value of column in Pyspark – abs() function
- Set Difference in Pyspark – Difference of two dataframe
- Union and union all of two dataframe in pyspark (row bind)
- Intersect of two dataframe in pyspark (two or more)
- Round up, Round down and Round off in pyspark – (Ceil & floor pyspark)
- Sort the dataframe in pyspark – Sort on single column & Multiple column
- Drop rows in pyspark – drop rows with condition
- Distinct value of a column in pyspark
- Distinct value of dataframe in pyspark – drop duplicates
- Count of Missing (NaN,Na) and null values in Pyspark
- Mean, Variance and standard deviation of column in Pyspark
- Maximum or Minimum value of column in Pyspark
- Raised to power of column in pyspark – square, cube , square root and cube root in pyspark
- Drop column in pyspark – drop single & multiple columns
- Select column in Pyspark (Select single & Multiple columns)