Get the substring of the column in pandas python

str.slice function extracts the substring of the column in pandas dataframe python. Let’s see an Example of how to get a substring from column of pandas dataframe and store it in new column. Extracting the substring of the column in pandas python can be done by using extract function with regular expression in it.

  • Extract substring from the column in pandas python
  • Fetch substring from start (left) of the column in pandas
  • Get substring from end (right) of the column in pandas
  • Get substring of the column using regular expression in pandas python

 

Substring of column in pandas python:

Substring of column in pandas data frames achieved by using str.slice function. Let’s see with an example. First let’s create a data frame

import pandas as pd
import numpy as np

#Create a DataFrame
df1 = {
    'State':['Arizona AZ','Georgia GG','Newyork NY','Indiana IN','Florida FL'],
   'Score':[62,47,55,74,31]}

df1 = pd.DataFrame(df1,columns=['State','Score'])
print(df1)

df1 will be:

Get substring of column in pandas 1

We will be using str.slice function on the column to get the substring. Here we will be taking first 7 letters as the substring on State column and will be naming the column as state_substring as shown below


''' Get the substring in pandas '''

df1['state_substring'] =df1.State.str.slice(0, 7)
print(df1)

so the resultant dataframe contains first 7 letters of the “state” column are stored in separate column

Get substring of column in pandas 2

 

 

 

Extract substring of the column in pandas using regular Expression:

We have extracted the last word of the state column using regular expression and stored in other column.

df1['State_code'] = df1.State.str.extract(r'\b(\w+)$', expand=True)
print(df1)

so the resultant dataframe will be

Extract Substring from column in pandas python 2

 

 

Extract substring from right (end) of the column in pandas:

str[-n:] is used to get last n character of column in pandas

df1['Stateright'] = df1['State'].str[-2:]
print(df1)

str[-2:] is used to get last two character from right of column in pandas and it is stored in another column namely Stateright so the resultant dataframe will be

Return last n character from right of column in pandas python 2

 

Extract substring from start (left) of column in pandas:

str[:n] is used to get first n characters of column in pandas


df1['StateInitial'] = df1['State'].str[:2]
print(df1)	

str[:2] is used to get first two characters from left of column in pandas and it is stored in another column namely StateInitial so the resultant dataframe will be

Return first n character from left of column in pandas python 2

 

Difference between two dates in days , weeks, Months and years in Pandas python                                                                                                           Get the substring of the column in pandas python

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.