[r] Replace all 0 values to NA

I have a dataframe with some numeric columns. Some row has a 0 value which should be considered as null in statistical analysis. What is the fastest way to replace all the 0 value to NULL in R?

This question is related to r r-faq

The answer is


Let me assume that your data.frame is a mix of different datatypes and not all columns need to be modified.

to modify only columns 12 to 18 (of the total 21), just do this

df[, 12:18][df[, 12:18] == 0] <- NA

In case anyone arrives here via google looking for the opposite (i.e. how to replace all NAs in a data.frame with 0), the answer is

df[is.na(df)] <- 0

OR

Using dplyr / tidyverse

library(dplyr)
mtcars %>% replace(is.na(.), 0)

#Sample data
set.seed(1)
dat <- data.frame(x = sample(0:2, 5, TRUE), y = sample(0:2, 5, TRUE))
#-----
  x y
1 0 2
2 1 2
3 1 1
4 2 1
5 0 0

#replace zeros with NA
dat[dat==0] <- NA
#-----
   x  y
1 NA  2
2  1  2
3  1  1
4  2  1
5 NA NA

You can replace 0 with NA only in numeric fields (i.e. excluding things like factors), but it works on a column-by-column basis:

col[col == 0 & is.numeric(col)] <- NA

With a function, you can apply this to your whole data frame:

changetoNA <- function(colnum,df) {
    col <- df[,colnum]
    if (is.numeric(col)) {  #edit: verifying column is numeric
        col[col == -1 & is.numeric(col)] <- NA
    }
    return(col)
}
df <- data.frame(sapply(1:5, changetoNA, df))

Although you could replace the 1:5 with the number of columns in your data frame, or with 1:ncol(df).


An alternative way without the [<- function:

A sample data frame dat (shamelessly copied from @Chase's answer):

dat

  x y
1 0 2
2 1 2
3 1 1
4 2 1
5 0 0

Zeroes can be replaced with NA by the is.na<- function:

is.na(dat) <- !dat


dat

   x  y
1 NA  2
2  1  2
3  1  1
4  2  1
5 NA NA

dplyr::na_if() is an option:

library(dplyr)  

df <- data_frame(col1 = c(1, 2, 3, 0),
                 col2 = c(0, 2, 3, 4),
                 col3 = c(1, 0, 3, 0),
                 col4 = c('a', 'b', 'c', 'd'))

na_if(df, 0)
# A tibble: 4 x 4
   col1  col2  col3 col4 
  <dbl> <dbl> <dbl> <chr>
1     1    NA     1 a    
2     2     2    NA b    
3     3     3     3 c    
4    NA     4    NA d

Because someone asked for the Data.Table version of this, and because the given data.frame solution does not work with data.table, I am providing the solution below.

Basically, use the := operator --> DT[x == 0, x := NA]

library("data.table")

status = as.data.table(occupationalStatus)

head(status, 10)
    origin destination  N
 1:      1           1 50
 2:      2           1 16
 3:      3           1 12
 4:      4           1 11
 5:      5           1  2
 6:      6           1 12
 7:      7           1  0
 8:      8           1  0
 9:      1           2 19
10:      2           2 40


status[N == 0, N := NA]

head(status, 10)
    origin destination  N
 1:      1           1 50
 2:      2           1 16
 3:      3           1 12
 4:      4           1 11
 5:      5           1  2
 6:      6           1 12
 7:      7           1 NA
 8:      8           1 NA
 9:      1           2 19
10:      2           2 40