Extract substring of the column in R dataframe

To extract the substring of the column in R we use functions like substr() , str_sub() or str_extract() function. Let’s see how to get the substring of the column in R using regular expression.  Given below are some of the examples discussed on getting the substring of the column in R.

  • Extract first n characters in R
  • Extract last n characters in R
  • Extract First word of the column in R
  • Extract last word of the column in R
  • Extract substring of the column using regular expression in R.

With an example of each. Let’s first create the dataframe

df1 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN','Florida FL'), Score=c(62,47,55,74,31))
df1

So the resultant dataframe will be

extract substring of the column in R dataframe 1

 

Extract first n characters of the column in R

Method 1:

In the below example we have used substr() function to find first n characters of the column in R. substr() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.

## Method 1 - extract first n character

df1$substring_State = substr(df1$State,1,4)
df1

so the dataframe will be

extract substring of the column in R dataframe 2

Method 2:

In the below example we have used str_sub() function to find first n characters of the column in R. str_sub() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.

## Method 2 - extract first n character

library(stringr)
df1$substring_State = str_sub(df1$State,1,4) 
df1

so the dataframe will be

extract substring of the column in R dataframe 3

 

Extract last n characters of the column in R:

In below example we have used str_sub() function to find last n characters of the column in R. str_sub() function takes column name, number of characters from last with minus symbol.

# extract last 2 string of column

df1$last_2_string = str_sub(df1$State,-2) 
df1

So the dataframe is

extract substring of the column in R dataframe 4

 

Extract First word of the column in R:

Extract first word of the column with str_extract() function along with regular expression is shown below

# extract first word of the column in R

df1$substring_first <- str_extract(df1$State,"(\\w+)") 
df1

So the resultant dataframe is

extract substring of the column in R dataframe 5

 

Extract Last word of the column in R

Extract last word of the column with str_extract() function along with regular expression is shown below

# extract last word of the column in R

library(stringr)
df1$substring_last <- str_extract(df1$State,"\\w+$") 
df1

So the resultant dataframe is

extract substring of the column in R dataframe 6

 

                                                                                                   

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.