[r] How to remove all whitespace from a string?

So " xx yy 11 22 33 " will become "xxyy112233". How can I achieve this?

This question is related to r regex string r-faq

The answer is


Use [[:blank:]] to match any kind of horizontal white_space characters.

gsub("[[:blank:]]", "", " xx yy 11 22  33 ")
# [1] "xxyy112233"

This way you can remove all spaces from all character variables in your data frame. If you would prefer to choose only some of the variables, use mutateor mutate_at.

library(dplyr)
library(stringr)

remove_all_ws<- function(string){
    return(gsub(" ", "", str_squish(string)))
}

df<-df %>%  mutate_if(is.character, remove_all_ws)


Another approach can be taken into account

library(stringr)
str_replace_all(" xx yy 11 22  33 ", regex("\\s*"), "")

#[1] "xxyy112233"

\\s: Matches Space, tab, vertical tab, newline, form feed, carriage return

*: Matches at least 0 times


x = "xx yy 11 22 33"

gsub(" ", "", x)

> [1] "xxyy112233"

The function str_squish() from package stringr of tidyverse does the magic!

library(dplyr)
library(stringr)

df <- data.frame(a = c("  aZe  aze s", "wxc  s     aze   "), 
                 b = c("  12    12 ", "34e e4  "), 
                 stringsAsFactors = FALSE)
df <- df %>%
  rowwise() %>%
  mutate_all(funs(str_squish(.))) %>%
  ungroup()
df

# A tibble: 2 x 2
  a         b     
  <chr>     <chr> 
1 aZe aze s 12 12 
2 wxc s aze 34e e4

I just learned about the "stringr" package to remove white space from the beginning and end of a string with str_trim( , side="both") but it also has a replacement function so that:

a <- " xx yy 11 22 33 " 
str_replace_all(string=a, pattern=" ", repl="")

[1] "xxyy112233"

Please note that soultions written above removes only space. If you want also to remove tab or new line use stri_replace_all_charclass from stringi package.

library(stringi)
stri_replace_all_charclass("   ala \t  ma \n kota  ", "\\p{WHITE_SPACE}", "")
## [1] "alamakota"

From stringr library you could try this:

  1. Remove consecutive fill blanks
  2. Remove fill blank

    library(stringr)

                2.         1.
                |          |
                V          V
    
        str_replace_all(str_trim(" xx yy 11 22  33 "), " ", "")
    

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc