How do I replace NA values with zeros in an R dataframe

Question

I have a data frame and some columns have NA values   How do I replace these NA values with zeroes

User · Answer

More general approach of using replace   in matrix or vector to replace NA to 0  For example    gt  x  lt - c 1 2 NA NA 1 1   gt  x1  lt - replace x is na x  0   gt  x1  1  1 2 0 0 1 1   This is also an alternative to using ifelse   in dplyr  df   data frame col   c 1 2 NA NA 1 1   df  lt - df   gt      mutate col   replace col is na col  0

User · Answer

Dedicated functions  nafill and setnafill  for that purpose is in data table  Whenever available  they distribute columns to be computed on multiple threads  library data table   ans df  lt - nafill df  fill 0     or even faster  in-place setnafill df  fill 0

User · Answer

An easy way to write it is with if na from hablar   library dplyr  library hablar   df  lt - tibble a   c 1  2  3  NA  5  6  8    df   gt      mutate a   if na a  0     which returns         a    lt dbl gt  1     1 2     2 3     3 4     0 5     5 6     6 7     8

User · Answer

The cleaner package has an na replace   generic  that at default replaces numeric values with zeroes  logicals with FALSE  dates with today  etc   starwars   gt   na replace   na replace starwars   It even supports vectorised replacements  mtcars 1 6  c  quot mpg quot    quot hp quot     lt - NA na replace mtcars  mpg  hp  replacement   c 999  123    Documentation  https   msberends github io cleaner reference na replace html

User · Answer

If you want to replace NAs in factor variables  this might be useful   n  lt - length levels data vector   1  data vector  lt - as numeric data vector  data vector is na data vector    lt - n data vector  lt - as factor data vector  levels data vector   lt - c  level1   level2       leveln    NAlevel      It transforms a factor-vector into a numeric vector and adds another artifical numeric factor level  which is then transformed back to a factor-vector with one extra  NA-level  of your choice

User · Answer

For a single vector   x  lt - c 1 2 NA 4 5  x is na x    lt - 0   For a data frame  make a function out of the above  then apply it to the columns   Please provide a reproducible example next time as detailed here   How to make a great R reproducible example

User · Answer

in data frame it is not necessary to create a new column by mutate  library tidyverse      k  lt - c 1 2 80 NA NA 51  j  lt - c NA NA 3 31 12 NA           df  lt - data frame k j   gt      replace na list j 0   convert only column j  for example       result k   j 1   0            2   0            80  3            NA  31           NA  12           51  0

User · Answer

If we are trying to replace NAs when exporting  for example when writing to csv  then we can use     write csv data   data csv   na    0

User · Answer

This simple function extracted from Datacamp could help   replace missings  lt - function x  replacement      is miss  lt - is na x    x is miss   lt - replacement    message sum is miss     missings replaced by the value    replacement    x     Then  replace missings df  replacement   0

User · Answer

See my comment in  gsk3 answer  A simple example    gt  m  lt - matrix sample c NA  1 10   100  replace   TRUE   10   gt  d  lt - as data frame m     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1   4  3 NA  3  7  6  6 10  6   5 2   9  8  9  5 10 NA  2  1  7   2 3   1  1  6  3  6 NA  1  4  1   6 4  NA  4 NA  7 10  2 NA  4  1   8 5   1  2  4 NA  2  6  2  6  7   4 6  NA  3 NA NA 10  2  1 10  8   4 7   4  4  9 10  9  8  9  4 10  NA 8   5  8  3  2  1  4  5  9  4   7 9   3  9 10  1  9  9 10  5  3   3 10  4  2  2  5 NA  9  7  2  5   5   gt  d is na d    lt - 0   gt  d    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1   4  3  0  3  7  6  6 10  6   5 2   9  8  9  5 10  0  2  1  7   2 3   1  1  6  3  6  0  1  4  1   6 4   0  4  0  7 10  2  0  4  1   8 5   1  2  4  0  2  6  2  6  7   4 6   0  3  0  0 10  2  1 10  8   4 7   4  4  9 10  9  8  9  4 10   0 8   5  8  3  2  1  4  5  9  4   7 9   3  9 10  1  9  9 10  5  3   3 10  4  2  2  5  0  9  7  2  5   5   There s no need to apply apply      EDIT  You should also take a look at norm package  It has a lot of nice features for missing data analysis

User · Answer

dplyr example   library dplyr   df1  lt - df1   gt       mutate myCol1   if else is na myCol1   0  myCol1     Note  This works per selected column  if we need to do this for all column  see  reidjax s answer using mutate each

User · Answer

You can use replace    For example    gt  x  lt - c -1 0 1 0 NA 0 1 1   gt  x1  lt - replace x 5 1   gt  x1  1  -1  0  1  0  1  0  1  1   gt  x1  lt - replace x 5 mean x na rm T    gt  x1  1  -1 00  0 00  1 00  0 00  0 29  0 00 1 00  1 00

User · Answer

if you want to assign a new name after changing the NAs in a specific column in this case column V3  use you can do also like this   my data frame the new column name  lt - ifelse is na my data frame V3  0 1

User · Answer

I know the question is already answered  but doing it this way might be more useful to some   Define this function   na zero  lt - function  x        x is na x    lt - 0     return x      Now whenever you need to convert NA s in a vector to zero s you can do   na zero some vector

User · Answer

Would ve commented on  ianmunoz s post but I don t have enough reputation   You can combine dplyr s mutate each and replace to take care of the NA to 0 replacement   Using the dataframe from  aL3xa s answer      gt  m  lt - matrix sample c NA  1 10   100  replace   TRUE   10   gt  d  lt - as data frame m   gt  d      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1   4  8  1  9  6  9 NA  8  9   8 2   8  3  6  8  2  1 NA NA  6   3 3   6  6  3 NA  2 NA NA  5  7   7 4  10  6  1  1  7  9  1 10  3  10 5  10  6  7 10 10  3  2  5  4   6 6   2  4  1  5  7 NA NA  8  4   4 7   7  2  3  1  4 10 NA  8  7   7 8   9  5  8 10  5  3  5  8  3   2 9   9  1  8  7  6  5 NA NA  6   7 10  6 10  8  7  1  1  2  2  5   7   gt  d   gt   mutate each  funs   interp   replace    is na    0             V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1   4  8  1  9  6  9  0  8  9   8 2   8  3  6  8  2  1  0  0  6   3 3   6  6  3  0  2  0  0  5  7   7 4  10  6  1  1  7  9  1 10  3  10 5  10  6  7 10 10  3  2  5  4   6 6   2  4  1  5  7  0  0  8  4   4 7   7  2  3  1  4 10  0  8  7   7 8   9  5  8 10  5  3  5  8  3   2 9   9  1  8  7  6  5  0  0  6   7 10  6 10  8  7  1  1  2  2  5   7   We re using standard evaluation  SE  here which is why we need the underscore on  funs     We also use lazyeval s interp   and the   references  everything we are working with   i e  the data frame   Now there are zeros

User · Answer

To replace all NAs in a dataframe you can use   df   gt   replace is na     0

User · Answer

Another example using imputeTS package   library imputeTS  na replace yourDataframe  0

User · Answer

The dplyr hybridized options are now around 30  faster than the Base R subset reassigns  On a 100M datapoint dataframe mutate all  replace    is na     0   runs a half a second faster than the base R d is na d    lt - 0 option  What one wants to avoid specifically is using an ifelse   or an if else     The complete 600 trial analysis ran to over 4 5 hours mostly due to including these approaches   Please see benchmark analyses below for the complete results   If you are struggling with massive dataframes  data table is the fastest option of all  40  faster than the standard Base R approach  It also modifies the data in place  effectively allowing you to work with nearly twice as much of the data at once     A clustering of other helpful tidyverse replacement approaches  Locationally       index mutate at c 5 10    replace    is na     0     direct reference mutate at vars var5 var10    replace    is na     0     fixed match mutate at vars contains  1      replace    is na     0      or in place of contains    try ends with   starts with    pattern match mutate at vars matches    d 2       replace    is na     0     Conditionally   change just single type and leave other types alone     integers mutate if is integer   replace    is na     0     numbers  mutate if is numeric   replace    is na     0     strings  mutate if is character   replace    is na     0        The Complete Analysis -  Updated for dplyr 0 8 0  functions use purrr format   symbols  replacing deprecated funs   arguments   Approaches tested     Base R   baseR sbst rssgn    lt - function x    x is na x    lt - 0  x   baseR replace       lt - function x    replace x  is na x   0    baseR for           lt - function x    for j in 1 ncol x       x  j   is na x  j       0      tidyverse    dplyr dplyr if else       lt - function x    mutate all x   if else is na     0        dplyr coalesce      lt - function x    mutate all x   coalesce    0         tidyr tidyr replace na    lt - function x    replace na x  as list setNames rep 0  10   as list c paste0  var   1 10             hybrid  hybrd ifelse      lt - function x    mutate all x   ifelse is na     0        hybrd replace na  lt - function x    mutate all x   replace na    0     hybrd replace     lt - function x    mutate all x   replace    is na     0     hybrd rplc at idx lt - function x    mutate at x  c 1 10    replace    is na     0     hybrd rplc at nse lt - function x    mutate at x  vars var1 var10    replace    is na     0     hybrd rplc at stw lt - function x    mutate at x  vars starts with  var      replace    is na     0     hybrd rplc at ctn lt - function x    mutate at x  vars contains  var      replace    is na     0     hybrd rplc at mtc lt - function x    mutate at x  vars matches    d       replace    is na     0     hybrd rplc if     lt - function x    mutate if x  is numeric   replace    is na     0        data table    library data table  DT for set nms    lt - function x    for  j in names x       set x which is na x  j     j 0    DT for set sqln   lt - function x    for  j in seq len ncol x        set x which is na x  j     j 0    DT nafill         lt - function x    nafill df  fill 0   DT setnafill      lt - function x    setnafill df  fill 0     The code for this analysis   library microbenchmark    20  NA filled dataframe of 10 Million rows and 10 columns set seed 42    to recreate the exact dataframe dfN  lt - as data frame matrix sample c NA  as numeric 1 4    1e7 10  replace   TRUE                               dimnames   list NULL  paste0  var   1 10                                 ncol   10     Running 600 trials with each replacement method     the functions are excecuted locally - so that the original dataframe remains unmodified in all cases  perf results  lt - microbenchmark      hybrid ifelse      hybrid ifelse copy dfN        dplyr if else      dplyr if else copy dfN        hybrd replace na   hybrd replace na copy dfN        baseR sbst rssgn   baseR sbst rssgn copy dfN        baseR replace      baseR replace copy dfN        dplyr coalesce     dplyr coalesce copy dfN        tidyr replace na   tidyr replace na copy dfN        hybrd replace      hybrd replace copy dfN        hybrd rplc at ctn  hybrd rplc at ctn copy dfN        hybrd rplc at nse  hybrd rplc at nse copy dfN        baseR for          baseR for copy dfN        hybrd rplc at idx  hybrd rplc at idx copy dfN        DT for set nms     DT for set nms copy dfN        DT for set sqln    DT for set sqln copy dfN        times   600L     Summary of Results    gt  print perf results  Unit  milliseconds               expr       min        lq     mean   median       uq      max neval       hybrd ifelse 6171 0439 6339 7046 6425 221 6407 397 6496 992 7052 851   600      dplyr if else 3737 4954 3877 0983 3953 857 3946 024 4023 301 4539 428   600   hybrd replace na 1497 8653 1706 1119 1748 464 1745 282 1789 804 2127 166   600   baseR sbst rssgn 1480 5098 1686 1581 1730 006 1728 477 1772 951 2010 215   600      baseR replace 1457 4016 1681 5583 1725 481 1722 069 1766 916 2089 627   600     dplyr coalesce 1227 6150 1483 3520 1524 245 1519 454 1561 488 1996 859   600   tidyr replace na 1248 3292 1473 1707 1521 889 1520 108 1570 382 1995 768   600      hybrd replace  913 1865 1197 3133 1233 336 1238 747 1276 141 1438 646   600  hybrd rplc at ctn  916 9339 1192 9885 1224 733 1227 628 1268 644 1466 085   600  hybrd rplc at nse  919 0270 1191 0541 1228 749 1228 635 1275 103 2882 040   600          baseR for  869 3169 1180 8311 1216 958 1224 407 1264 737 1459 726   600  hybrd rplc at idx  839 8915 1189 7465 1223 326 1228 329 1266 375 1565 794   600     DT for set nms  761 6086  915 8166 1015 457 1001 772 1106 315 1363 044   600    DT for set sqln  787 3535  918 8733 1017 812 1002 042 1122 474 1321 860   600    Boxplot of Results  ggplot perf results  aes x expr  y time 10 9         geom boxplot         xlab  Expression         ylab  Elapsed Time  Seconds          scale y continuous breaks   seq 0 7 1         coord flip       Color-coded Scatterplot of Trials  with y-axis on a log scale   qplot y time 10 9  data perf results  colour expr         labs y    log10 Scaled Elapsed Time per Trial  secs    x    Trial Number         coord cartesian ylim   c 0 75  7 5         scale y log10 breaks c 0 75  0 875  1  1 25  1 5  1 75  seq 2  7 5        A note on the other high performers  When the datasets get larger  Tidyr  s replace na had historically pulled out in front  With the current collection of 100M data points to run through  it performs almost exactly as well as a Base R For Loop  I am curious to see what happens for different sized dataframes   Additional examples for the mutate and summarize  at and  all function variants can be found here  https   rdrr io cran dplyr man summarise all html Additionally  I found helpful demonstrations and collections of examples here  https   blog exploratory io dplyr-0-5-is-awesome-heres-why-be095fd4eb8a  Attributions and Appreciations  With special thanks to       Tyler Rinker and Akrun for demonstrating microbenchmark    alexis laz for working on helping me understand the use of local    and  with Frank s patient help  too  the role that silent coercion plays in speeding up many of these approaches    ArthurYip for the poke to add the newer coalesce   function in and update the analysis  Gregor for the nudge to figure out the data table functions well enough to finally include them in the lineup  Base R For loop  alexis laz data table For Loops  Matt Dowle Roman for explaining what is numeric   really tests     Of course  please reach over and give them upvotes  too if you find those approaches useful    Note on my use of Numerics   If you do have a pure integer dataset  all of your functions will run faster  Please see alexiz laz s work for more information  IRL  I can t recall encountering a data set containing more than 10-15  integers  so I am running these tests on fully numeric dataframes   Hardware Used 3 9 GHz CPU with 24 GB RAM

User · Answer

With dplyr 0 5 0  you can use coalesce function which can be easily integrated into   gt   pipeline by doing coalesce vec  0   This replaces all NAs in vec with 0   Say we have a data frame with NAs   library dplyr  df  lt - data frame v   c 1  2  3  NA  5  6  8    df      v   1  1   2  2   3  3   4 NA   5  5   6  6   7  8  df   gt   mutate v   coalesce v  0       v   1 1   2 2   3 3   4 0   5 5   6 6   7 8

User · Answer

It is also possible to use tidyr  replace na       library tidyr      df  lt - df   gt   mutate all funs replace na   0

User · Answer

Another dplyr pipe compatible option with tidyrmethod replace na that works for several columns   require dplyr  require tidyr   m  lt - matrix sample c NA  1 10   100  replace   TRUE   10  d  lt - as data frame m   myList  lt - setNames lapply vector  list   ncol d    function x  x  lt - 0   names d    df  lt - d   gt   replace na myList    You can easily restrict to e g  numeric columns   d str  lt - c  string   NA   myList  lt - myList sapply d  is numeric    df  lt - d   gt   replace na myList

[r] How do I replace NA values with zeros in an R dataframe?

Examples related to r

Examples related to dataframe

Examples related to na

Examples related to missing-data

Examples related to imputation