Rename column name in pyspark - Rename single and multiple column

In order to rename column name in pyspark, we will be using functions like withColumnRenamed(), alias() etc. We will see an example on how to rename a single column in pyspark. Rename multiple columns in pyspark.

Rename single column in pyspark
Rename multiple columns in pyspark using selectExpr
Rename multiple columns in pyspark using alias function()
Rename multiple columns in pyspark using withcolumnRenamed()

We will be using the dataframe named df

Rename column name in pyspark – Rename single and multiple column 1

rename column name in pyspark c1

Rename column name : Rename single column in pyspark

Syntax:

df.withColumnRenamed(‘old_name’, ‘new_name’)

old_name – old column name
new_name – new column name to be replaced.

### Rename a single column in pyspark

df1=df.withColumnRenamed('name', 'Student_name')

df1.show()

withColumnRenamed() takes up two arguments. First argument is old name and Second argument is new name. In our example column “name” is renamed to “Student_name”
Rename column name in pyspark – Rename single and multiple column 2

Rename multiple columns in pyspark using selectExpr

Rename using selectExpr() in pyspark uses “as” keyword to rename the column “Old_name” as “New_name”.

### Rename multiple columns in pyspark

df1 = df.selectExpr("name as Student_name", "birthdaytime as birthday_and_time", "grad_Score as grade")

df1.show()

In our example “name” is renamed as “Student_name”. “birthdaytime” is renamed as “birthday_and_time”. “grad_Score” is renamed as “grade”.
Rename column name in pyspark – Rename single and multiple column 3

Rename multiple columns in pyspark using alias

Rename using alias() in pyspark. Col(“old_name”).alias(“new_name”) renames the multiple columns

### Rename multiple columns in pyspark

from pyspark.sql.functions import col

df1 = df.select(col("name").alias("Student_name"), col("birthdaytime").alias("birthday_and_time"),col("grad_Score").alias("grade"))
df1.show()

Rename multiple columns in pyspark using withcolumnRenamed()

withColumnRenamed() takes up two arguments. First argument is old name and Second argument is new name. We use reduce function to pass list of oldColumns[] and newColumns[]

### Rename multiple columns in pyspark

oldColumns = df.schema.names
newColumns = ["Student_name", "birthday_and_time","grade"]

from functools import reduce

df1 = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), df)
df1.printSchema()
df1.show()

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts