keep or extract only character values in pandas column

In This section we will be focusing on how to Extract or keep only character values of the column in pandas and there by remove all the numeric values in the pandas column. We will be keeping only character values in the specific column and across all the column, by using multiple methods. we will be discussing all the methods as shown below.

  • Extract only character values in pandas column Using extract() function
  • Keep only character values in pandas column Using replace() function
  • Keep only character values in pandas column Using replace() function with regular expression
  • Keep only character values in pandas column Using isalpha() function

 

Create Dataframe

 


## create dataframe 

import pandas as pd 
import numpy as np 
#Create a DataFrame 
import pandas as pd 
import numpy as np 
d = { 'StudentID':['Alisa819','Bobby212','Cathrine891','Jodha982','Raghu453','Ram834'], 
     'Maths':[76,73,83,93,89,94], 
     'Science':[85,41,55,75,81,97],
     'Geography':[78,65,55,88,87,98]} 
    
df = pd.DataFrame(d,columns=['StudentID','Maths','Science','Geography']) 
df

Dataframe

keep-only-character-values-in-pandas-column-and-remove-numeric-values-1

 

 

 

Extract only character values in a specific pandas column Using extract() function

In the below method we will be using extract() function with regular expression which will extract only character values and then converting the resultant column into string.

 

### Method 1 : extract() function

df['StudentID'] = df['StudentID'].str.extract('([a-zA-Z]+)', expand=False)
df['StudentID'] = df['StudentID'].astype('string')

df

so the resultant dataframe will be.

keep-only-character-values-in-pandas-column-and-remove-numeric-values-2

 

 

 

Keep only character values in pandas column Using replace() function: Method 1

In the below method we will be using str.replace() function with regular expression which will replace digits with empty string “”. and then converting the resultant column into string.

 

### Method 2a : replace() function with regex()

df['StudentID'] = df['StudentID'].str.replace(r'(\d+)', '')
df['StudentID'] = df['StudentID'].astype('string')

df

So, the resultant dataframe will be

keep-only-character-values-in-pandas-column-and-remove-numeric-values-2

 

 

Keep only string values in pandas column Using replace() function : Method 2

In the below method we will be using str.replace() function with regular expression which will replace non character values with empty string “”. and then converting the resultant column into string.

 

### Method 2b : replace() function with regex()


df['StudentID'] = df['StudentID'].str.replace(r'[^a-zA-Z]+', '')
df['StudentID'] = df['StudentID'].astype('string')

df

So, the resultant dataframe will be

keep-only-character-values-in-pandas-column-and-remove-numeric-values-2

 

 

keep only character values in pandas column Using isalpha() function

In the below method we will be using isalpha() function with each element of the column with  which alphabets i.e. characters are captured and joined together and the column is created and then converting the resultant column into string

 

 

### Method 3 : isalpha() function

df['StudentID'] = df['StudentID'].map(lambda x: ''.join([i for i in x if i.isalpha()]))
df['StudentID'] = df['StudentID'].astype('string')

df

So, the resultant dataframe will be

keep-only-character-values-in-pandas-column-and-remove-numeric-values-2

 

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.