Extract Substring from column in pandas python

Extracting the substring of the column in pandas python can be done by using extract function with regular expression in it. Let’s see how to

  • Extract the substring of the column in pandas python.

With examples

Syntax: dataframe.column.str.extract(r’regex’)

First let’s create a dataframe

import pandas as pd
import numpy as np

df1 = {
    'State':['Arizona AZ','Georgia GG','Newyork NY','Indiana IN','Florida FL'],
   'Score1':[4,47,55,74,31]}

df1 = pd.DataFrame(df1,columns=['State','Score1'])
print(df1)

df1 will be

Extract Substring from column in pandas python 1

 

Extract substring of a column in pandas:

We have extracted the last word of the state column using regular expression and stored in other column.

df1['State_code'] = df1.State.str.extract(r'\b(\w+)$', expand=True)
print(df1)

so the resultant dataframe will be

Extract Substring from column in pandas python 2

 

p Extract Substring from column in pandas python                                                                                                          n Extract Substring from column in pandas python

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.