Replace substring/pattern of column in pandas python

Replace a substring of a column in pandas python can be done by replace() funtion. Let’s see how to

  • Replace a substring/string in  entire pandas dataframe
  • Replace multiple string/substring of multiple columns at once in entire pandas dataframe
  • Replace a pattern in entire pandas dataframe.
  • Replace Space with underscore in column of pandas dataframe.
  • Replace a substring with another substring in pandas column
  • Replace a pattern of substring with another substring using regular expression in pandas column
  • Replace multiple string/substring at once in a column of pandas dataframe

With examples

First let’s create a dataframe


import pandas as pd
import numpy as np
 
#Create a DataFrame
df1 = {
    'State':['zona AZ','Georgia GG','Newyork NY','Indiana IN','Florida FL','Maharashtra','Delhi'],
    'Country':['US','US','US','US','US','India','India']}
 
df1 = pd.DataFrame(df1,columns=['State','Country'])
df1

df1 will be

Replace substring or pattern of column in pandas python 1

 

Replace a substring with another substring in pandas

df1.replace(regex=['zona'], value='Arizona')

A substring Zona is replaced with another string Arizona. So the resultant dataframe will be

Replace substring or pattern of column in pandas python 1_a

 

 

Replace a word/Text/substring in Entire pandas Dataframe:

The Substring India is replaced with Bharat, using replace function with regex=True argument as shown below. Entire dataframe is replaced with substring using regex

## Replace a word/Text in Entire Dataframe using Regex
# Method 1
df2 = df1.replace('India','Bharat', regex=True)
df2

 

Result:
Replace substring or pattern of column in pandas python 2

 

Replace a pattern of substring using regular expression:

Using regular expression we will replace the first character of the column by substring ‘HE’

df1.replace(regex=['^.'],value='HE')

so the resultant dataframe will be

Replace substring or pattern of column in pandas python 3_a

 

 

Replace Multiple columns at once in pandas dataframe:

In the below example we have replaced multiple values of the pandas dataframe at once using apply function and replace function along with regex.

 Method 1:
#Replace Multiple values at once in entire pandas dataframe
#Method 1
df2 = df1.apply(lambda x: x.replace({'zona':'Arizona', 'US':'United States'}, regex=True))
df2
Result:

Replace substring or pattern of column in pandas python 4

 

 

 Method 2:

In this example, we will show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with key-value pair. The below example updates Zona with Arizona with on State column and US with United States on Country column.

 

# Method 2 Replace multiple values at multiple columns

df2 = df1.replace({'State': 'zona', 'Country': 'US'}, 
    {'State': 'Arizona', 'Country': 'United States'}, regex=True)

df2
Result:

Replace substring or pattern of column in pandas python 3

 

 

 

Replace a Pattern of entire dataframe pandas:

In this example we will be replacing a pattern on entire dataframe,  India is replaced with  IND in the entire dataframe across all columns as shown below.

### Replace a Pattern of entire dataframe pandas

df2=df1.replace(regex=['India'],value='IND')
df2
Result:

Replace substring or pattern of column in pandas python 5

 

 

Replace a Pattern of the particular column in dataframe pandas

In this example we will be replacing a pattern on a particular column of the dataframe,  India is replaced with  IND in the  Country column using regex  as shown below.

Method1:
### Replace a Pattern of the particular column in dataframe pandas
## Method 1
df1['Country']=df1['Country'].replace(regex=['India'],value='IND')
df1
Result:

Replace substring or pattern of column in pandas python 7

 

Method 2:

In this example we will be replacing a pattern on a particular column of the dataframe,  India is replaced with  IND in the  Country column using str.replace() function  as shown below.

## Method2 : str.replace()
df1['Country'] = df1['Country'].str.replace('India','IND')
df1
Result:

Replace substring or pattern of column in pandas python 7

 

 

 

Replace space with underscore:

df1.replace(regex=[' '], value='_')

Space  is replaced with underscore (_) . So the resultant dataframe will be

Replace substring or pattern of column in pandas python 8

 

 

 

p Replace a substring of a column in pandas python                                                                                                         n Replace a substring of a column in pandas python

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.