[regex] Remove all special characters from a string in R?

How to remove all special characters from string in R and replace them with spaces ?

Some special characters to remove are : ~!@#$%^&*(){}_+:"<>?,./;'[]-=

I've tried regex with [:punct:] pattern but it removes only punctuation marks.

Question 2 : And how to remove characters from foreign languages like : â í ü Â á a e s c ?

Answer : Use [^[:alnum:]] to remove~!@#$%^&*(){}_+:"<>?,./;'[]-= and use [^a-zA-Z0-9] to remove also â í ü Â á a e s c in regex or regexpr functions.

Solution in base R :

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" 
gsub("[[:punct:]]", "", x)  # no libraries needed

This question is related to regex string r character

The answer is


You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.


Instead of using regex to remove those "crazy" characters, just convert them to ASCII, which will remove accents, but will keep the letters.

astr <- "Ábcdêãçoàúü"
iconv(astr, from = 'UTF-8', to = 'ASCII//TRANSLIT')

which results in

[1] "Abcdeacoauu"

Convert the Special characters to apostrophe,

Data  <- gsub("[^0-9A-Za-z///' ]","'" , Data ,ignore.case = TRUE)

Below code it to remove extra ''' apostrophe

Data <- gsub("''","" , Data ,ignore.case = TRUE)

Use gsub(..) function for replacing the special character with apostrophe


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to character

Set the maximum character length of a UITextField in Swift Max length UITextField Remove last character from string. Swift language Get nth character of a string in Swift programming language How many characters can you store with 1 byte? How to convert integers to characters in C? Converting characters to integers in Java How to check the first character in a string in Bash or UNIX shell? Invisible characters - ASCII How to delete Certain Characters in a excel 2010 cell