[r] Replace all particular values in a data frame

Having a data frame, how do I go about replacing all particular values along all rows and columns. Say for example I want to replace all empty records with NA's (without typing the positions):

df <- data.frame(list(A=c("", "xyz", "jkl"), B=c(12, "", 100)))

    A   B
1      12
2  xyz    
3  jkl 100

Expected result:

    A   B
1  NA   12
2  xyz  NA  
3  jkl  100

This question is related to r dataframe replace

The answer is


Since PikkuKatja and glallen asked for a more general solution and I cannot comment yet, I'll write an answer. You can combine statements as in:

> df[df=="" | df==12] <- NA
> df
     A    B
1  <NA> <NA>
2  xyz  <NA>
3  jkl  100

For factors, zxzak's code already yields factors:

> df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)))
> str(df)
'data.frame':   3 obs. of  2 variables:
 $ A: Factor w/ 3 levels "","jkl","xyz": 1 3 2
 $ B: Factor w/ 3 levels "","100","12": 3 1 2

If in trouble, I'd suggest to temporarily drop the factors.

df[] <- lapply(df, as.character)

If you want to replace multiple values in a data frame, looping through all columns might help.

Say you want to replace "" and 100:

na_codes <- c(100, "")
for (i in seq_along(df)) {
    df[[i]][df[[i]] %in% na_codes] <- NA
}

Here are a couple dplyr options:

library(dplyr)

# all columns:
df %>% 
  mutate_all(~na_if(., ''))

# specific column types:
df %>% 
  mutate_if(is.factor, ~na_if(., ''))

# specific columns:  
df %>% 
  mutate_at(vars(A, B), ~na_if(., ''))

# or:
df %>% 
  mutate(A = replace(A, A == '', NA))

# replace can be used if you want something other than NA:
df %>% 
  mutate(A = as.character(A)) %>% 
  mutate(A = replace(A, A == '', 'used to be empty'))

We can use data.table to get it quickly. First create df without factors,

df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)), stringsAsFactors=F)

Now you can use

setDT(df)
for (jj in 1:ncol(df)) set(df, i = which(df[[jj]]==""), j = jj, v = NA)

and you can convert it back to a data.frame

setDF(df)

If you only want to use data.frame and keep factors it's more difficult, you need to work with

levels(df$value)[levels(df$value)==""] <- NA

where value is the name of every column. You need to insert it in a loop.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to replace

How do I find and replace all occurrences (in all files) in Visual Studio Code? How to find and replace with regex in excel How to replace text in a column of a Pandas dataframe? How to replace negative numbers in Pandas Data Frame by zero Replacing few values in a pandas dataframe column with another value How to replace multiple patterns at once with sed? Using tr to replace newline with space replace special characters in a string python Replace None with NaN in pandas dataframe Batch script to find and replace a string in text file within a minute for files up to 12 MB