Substring Function in R – substr()

The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().

  • substring of the vector in R using substr() function.
  • Extract substring of the column using regular expression in R.

 

syntax for Substring function in R: 

substr(text, start, stop)
substring(text, first, last = 1000000L)
  • First Argument (Text) is the string,
  • second argument (start/first)  is start position of the substring
  • Third argument(stop/last) is end position of the substring

Extracting values with substring function in R:

Lets see an example to extract values using substring function.

## Extracting values with substring in R

substring("HumptyDumpty sat on a wall",5,9)

when we execute the above code, string staring from the 5th letter to 9th letter is extracted as a substring, so the output will be

[1] “tyDum”

 

Replacing values with substring function in R:

Substring function is not just used for extraction, but also used for replacing the part of substring. lets see with help of an example

## Replacing values with substring in R

mystring<-"Humpty_Dumpty sat on a wall"
substring(mystring,7,7)=" "
mystring

when we execute the above code the underscore(_) is replaced with space(“ ”). So the output will be

“[1] Humpty Dumpty sat on a wall”

 

String replacement in R using substring() function

Lets see an another example with recycling replacement

## Replacement with recycling

z = c("may", "the", "rain", "shower")
substring(z, 2, 3) <- c("@", "#")

in the above example, substring function replaces every second letter with @ and # consecutively. So the output will be

[1] “m@y”    “t#e”    “r@in”   “s#ower”

 

 

Substring of the column using substr() and str_sub() function:

Let’s first create the dataframe

df1 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN','Florida FL'), Score=c(62,47,55,74,31))
df1

So the resultant dataframe will be

extract substring of the column in R dataframe 1

 

Extract first n characters of the column in R

Method 1:

In the below example we have used substr() function to find first n characters of the column in R. substr() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.

## Method 1 - extract first n character

df1$substring_State = substr(df1$State,1,4)
df1

so the dataframe will be

extract substring of the column in R dataframe 2

Method 2:

In the below example we have used str_sub() function to find first n characters of the column in R. str_sub() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.

## Method 2 - extract first n character

library(stringr)
df1$substring_State = str_sub(df1$State,1,4) 
df1

so the dataframe will be

extract substring of the column in R dataframe 3

 

Extract last n characters of the column in R

In below example we have used str_sub() function to find last n characters of the column in R. str_sub() function takes column name, number of characters from last with minus symbol.

# extract last 2 string of column

df1$last_2_string = str_sub(df1$State,-2) 
df1

So the dataframe is

extract substring of the column in R dataframe 4

 

 

Extract last n characters of the column in R

In below example we have used str_sub() function to find last n characters of the column in R. str_sub() function takes column name, number of characters from last with minus symbol.

# extract last 2 string of column

df1$last_2_string = str_sub(df1$State,-2) 
df1

So the dataframe is

extract substring of the column in R dataframe 4

 

Extract First word of the column in R

Extract first word of the column with str_extract() function along with regular expression is shown below

# extract first word of the column in R

df1$substring_first = str_extract(df1$State,"(\\w+)") 
df1

So the resultant dataframe is

extract substring of the column in R dataframe 5

 

Extract Last word of the column in R

Extract last word of the column with str_extract() function along with regular expression is shown below

# extract last word of the column in R

library(stringr)
df1$substring_last = str_extract(df1$State,"\\w+$") 
df1

So the resultant dataframe is

extract substring of the column in R dataframe 6

previous small substring function in R                                                                                                           next small substring function in R

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.