sub() and gsub() function in R

sub() and gsub() function in R are replacement functions, which replaces the occurrence of a substring with other substring. gsub() function and sub() function in R is used to replace the occurrence of a string with other in Vector and the column of a dataframe. gsub() function can also be used with the combination of regular expression. Lets see an example for each

  • sub() Function in R replaces the first instance of a substring
  • gsub() function in R replaces all the instances of a substring
  • Replacing the occurrence of the string using sub() and gsub() function of the column in R dataframe
  • Replacing the occurrence of the string in vector using gsub() and sub() function

 

Syntax for sub() and gsub() function in R:

  1. sub(old, new, string)

2. gsub(old, new, string)

old – Already exiting pattern to be replaced.
new –  New string to be used for replacement.
String –
string, character vector/ dataframe column for replacement

 

 

Example of sub() function in R:

sub() function in R replaces only the first occurrence of a substring. The sub function finds the first instance of the old substring and replaces it with the new substring. let’s see with an example.

# sub function in R

mysentence <- "England is Beautiful. England is not the part of EU"
sub("England", "UK", mysentence)

only England in the first occurrence is replaced with UK. so the output will be

[1] “UK is Beautiful. England is not the part of EU”

 

 

Example of gsub() function in R:

   gsub() function in R is global replace function, which replaces all instances of the substring not just the first. Lets see the same example

# gsub function in R

mysentence <- "England is Beautiful. England is not the part of EU"
gsub("England", "UK", mysentence)

all the occurrences of England is replaced with UK. so the output will be

[1] “UK is Beautiful. UK is not the part of EU”

Example of gsub() function with regular expression in R:

 The old argument in the syntax can be a regular expression, which allows you to match patterns in which you want to replace a substring. Lets see an example

# gsub function in R with regular expression

mysentence <- "UK is Beautiful. UK is not the part of EU since 2016"
gsub("[0-9]*", "", mysentence)

In the above example we have removed all the numbers from the sentence with the help of regular expression.

So the output will be

[1] “UK is Beautiful. UK is not the part of EU since “

 

 

Example of gsub() function in the column of a dataframe :

First lets create the dataframe as depicted below

df = data.frame (NAME =c ('Alisa','Bobby','jodha','jack','raghu','Cathrine', 'Alisa','Bobby','kumar','Alisa','jack','Cathrine'), Age = c (26,24,26,22,23,24,26,24,22,26,22,25), Score =c(85,63,55,74,31,77,85,63,42,85,74,78)) 

df

so the resultant dataframe will be

 

 

gsub() function in the column of R dataframe to replace a substring:

gsub() function is also applicable in the column of the dataframe in R. Lets see the below example.

 
## Replace substring of the column in R dataframe

df$NAME = gsub("A","E",df$NAME)
df

As mentioned  every occurrences of “A” is replaced with “E”. so the resultant dataframe will be

sub gsub function in R 3

 

gsub() function in the column of R dataframe to replace a substring:

gsub() function in R along with the regular expression is used to replace the multiple occurrences of a pattern in the column of the dataframe. Lets see the below example.

 
## Replace substring of the column in R dataframe using REGEX

df$NAME = gsub(".*^","MR/MRS.",df$NAME)
df

As mentioned “MR/MRS.” will be added to the Name column using regular expression. so the resultant dataframe will be

 

previous small sub() and gsub() function in R                                                                                                     next small sub() and gsub() function in R

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.