In order to rename column name in pyspark, we will be using functions like withColumnRenamed(), alias() etc. We will see an example on how to rename a single column in pyspark. Rename multiple columns in pyspark.
- Rename single column in pyspark
- Rename multiple columns in pyspark using selectExpr
- Rename multiple columns in pyspark using alias function()
- Rename multiple columns in pyspark using withcolumnRenamed()
We will be using the dataframe named df
Rename column name : Rename single column in pyspark
Syntax:
old_name – old column name
new_name – new column name to be replaced.
### Rename a single column in pyspark df1=df.withColumnRenamed('name', 'Student_name') df1.show()
withColumnRenamed() takes up two arguments. First argument is old name and Second argument is new name. In our example column “name” is renamed to “Student_name”
Rename multiple columns in pyspark using selectExpr
Rename using selectExpr() in pyspark uses “as” keyword to rename the column “Old_name” as “New_name”.
### Rename multiple columns in pyspark df1 = df.selectExpr("name as Student_name", "birthdaytime as birthday_and_time", "grad_Score as grade") df1.show()
In our example “name” is renamed as “Student_name”. “birthdaytime” is renamed as “birthday_and_time”. “grad_Score” is renamed as “grade”.
Rename multiple columns in pyspark using alias
Rename using alias() in pyspark. Col(“old_name”).alias(“new_name”) renames the multiple columns
### Rename multiple columns in pyspark from pyspark.sql.functions import col df1 = df.select(col("name").alias("Student_name"), col("birthdaytime").alias("birthday_and_time"),col("grad_Score").alias("grade")) df1.show()
In our example “name” is renamed as “Student_name”. “birthdaytime” is renamed as “birthday_and_time”. “grad_Score” is renamed as “grade”.
Rename multiple columns in pyspark using withcolumnRenamed()
withColumnRenamed() takes up two arguments. First argument is old name and Second argument is new name. We use reduce function to pass list of oldColumns[] and newColumns[]
### Rename multiple columns in pyspark oldColumns = df.schema.names newColumns = ["Student_name", "birthday_and_time","grade"] from functools import reduce df1 = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), df) df1.printSchema() df1.show()
In our example “name” is renamed as “Student_name”. “birthdaytime” is renamed as “birthday_and_time”. “grad_Score” is renamed as “grade”.
Other Related Topics:
- Typecast Integer to Decimal and Integer to float in Pyspark
- Get number of rows and number of columns of dataframe in pyspark
- Extract Top N rows in pyspark – First N rows
- Absolute value of column in Pyspark – abs() function
- Set Difference in Pyspark – Difference of two dataframe
- Union and union all of two dataframe in pyspark (row bind)
- Intersect of two dataframe in pyspark (two or more)
- Round up, Round down and Round off in pyspark – (Ceil & floor pyspark)
- Sort the dataframe in pyspark – Sort on single column & Multiple column
- Drop rows in pyspark – drop rows with condition
- Distinct value of a column in pyspark
- Distinct value of dataframe in pyspark – drop duplicates
- Count of Missing (NaN,Na) and null values in Pyspark
- Mean, Variance and standard deviation of column in Pyspark
- Maximum or Minimum value of column in Pyspark
- Raised to power of column in pyspark – square, cube , square root and cube root in pyspark
- Drop column in pyspark – drop single & multiple columns
- Subset or Filter data with multiple conditions in pyspark
- Frequency table or cross table in pyspark – 2 way cross table
- Groupby functions in pyspark (Aggregate functions) – Groupby count, Groupby sum, Groupby mean, Groupby min and Groupby max
- Descriptive statistics or Summary Statistics of dataframe in pyspark
- Rearrange or reorder column in pyspark
- cumulative sum of column and group in pyspark
- Calculate Percentage and cumulative percentage of column in pyspark
- Select column in Pyspark (Select single & Multiple columns)