Remove NA values from a vector

Question

I have a huge vector which has a couple of NA values  and I m trying to find the max value in that vector  the vector is all numbers   but I can t do this because of the NA values   How can I remove the NA values so that I can compute the max

User · Answer

Use discard from purrr  works with lists and vectors     discard v  is na     The benefit is that it is easy to use pipes  alternatively use the built-in subsetting function     v   gt   discard is na  v   gt        is na       Note that na omit does not work on lists    gt  x  lt - list a 1  b 2  c NA   gt  na omit x   a  1  1   b  1  2   c  1  NA

User · Answer

The na omit function is what a lot of the regression routines use internally   vec  lt - 1 1000 vec runif 200  1  1000    lt - NA max vec    1  NA max  na omit vec      1  1000

User · Answer

Trying  max  you ll see that it actually has a na rm   argument  set by default to FALSE   That s the common default for many other R functions  including sum    mean    etc      Setting na rm TRUE does just what you re asking for   d  lt - c 1  100  NA  10  max d  na rm TRUE    If you do want to remove all of the NAs  use this idiom instead   d  lt - d  is na d     A final note  Other functions  e g  table    lm    and sort    have NA-related arguments that use different names  and offer different options   So if NA s cause you problems in a function call  it s worth checking for a built-in solution among the function s arguments  I ve found there s usually one already there

User · Answer

max shows you that there is an extra parameter na rm that you can set to TRUE   Apart from that  if you really want to remove the NAs  just use something like   myvec  is na myvec

User · Answer

Just in case someone new to R wants a simplified answer to the original question     How can I remove NA values from a vector    Here it is   Assume you have a vector foo as follows   foo   c 1 10  NA  20 30    running length foo  gives 22    nona foo   foo  is na foo     length nona foo  is 21  because the NA values have been removed    Remember is na foo  returns a boolean matrix  so indexing foo with the opposite of this value will give you all the elements which are not NA

User · Answer

I ran a quick benchmark comparing the two base approaches and it turns out that x  is na x   is faster than na omit  User qwr suggested I try purrr  dicard also - this turned out to be massively slower  though I ll happily take comments on my implementation  amp  test    microbenchmark  microbenchmark    purrr  map airquality function x   x  is na x         purrr  map airquality na omit     purrr  map airquality   purrr  discard  x   p   is na      times   1e6   Unit  microseconds                                                      expr    min     lq      mean median      uq       max neval cld  purrr  map airquality  function x        x  is na x        66 8   75 9  130 5643   86 2  131 80  541125 5 1e 06 a                             purrr  map airquality  na omit    95 7  107 4  185 5108  129 3  190 50  534795 5 1e 06  b    purrr  map airquality   purrr  discard  x   p   is na   3391 7 3648 6 5615 8965 4079 7 6486 45 1121975 4 1e 06   c   For reference  here s the original test of x  is na x   vs na omit   microbenchmark  microbenchmark      purrr  map airquality function x   x  is na x           purrr  map airquality na omit        times   1000000    Unit  microseconds                                               expr  min   lq      mean median    uq      max neval cld  map airquality  function x        x  is na x      53 0 56 6  86 48231   58 1  64 8 414195 2 1e 06  a                            map airquality  na omit  85 3 90 4 134 49964   92 5 104 9 348352 8 1e 06   b

User · Answer

You can call max vector  na rm   TRUE   More generally  you can use the na omit   function

[r] Remove NA values from a vector

Examples related to r

Examples related to max

Examples related to min

Examples related to na

Examples related to missing-data