Cleaning Inf values from an R dataframe

Question

In R  I have an operation which creates some Inf values when I transform a dataframe     I would like to turn these Inf values into NA values   The code I have is slow for large data  is there a faster way of doing this    Say I have the following dataframe    dat  lt - data frame a c 1  Inf   b c Inf  3   d c  a   b      The following works in a single case     dat  1  is infinite dat  1      NA   So I generalized it with following loop  cf DFinf2NA  lt - function x        for  i in 1 ncol x              x  i  is infinite x  i      NA           return x      But I don t think that this is really using the power of R

User · Answer

You may also use the handy replace na function  https   tidyr tidyverse org reference replace na html

User · Answer

There is very simple solution to this problem in the hablar package   library hablar   dat   gt   rationalize     Which return a data frame with all Inf are converted to NA   Timings compared to some above solutions  Code  library hablar  library data table   dat  lt - data frame a   rep c 1 Inf   1e6   b   rep c Inf 2   1e6                      c   rep c  a   b   1e6  d   rep c 1 Inf   1e6                       e   rep c Inf 2   1e6   DT  lt - data table dat   system time dat mapply is infinite  dat    lt - NA  system time dat dat  Inf   lt - NA  system time invisible lapply names DT  function  name  set DT  which is infinite DT   name      j    name value  NA     system time rationalize dat     Result    gt  system time dat mapply is infinite  dat    lt - NA     user  system elapsed    0 125   0 039   0 164   gt  system time dat dat  Inf   lt - NA     user  system elapsed    0 095   0 010   0 108   gt  system time invisible lapply names DT  function  name  set DT  which is infinite DT   name      j    name value  NA        user  system elapsed    0 065   0 002   0 067   gt  system time rationalize dat      user  system elapsed    0 058   0 014   0 072   gt     Seems like data table is faster than hablar  But has longer syntax

User · Answer

Feng Mai has a tidyverse answer above to get negative and positive infinities   dat   gt   mutate if is numeric  list  na if    Inf      gt      mutate if is numeric  list  na if    -Inf      This works well  but a word of warning is not to swap in abs    here to do both lines at once as is proposed in an upvoted comment   It will look like it works  but changes all negative values in the dataset to positive   You can confirm with this   data iris   The last line here is bad - it converts all negative values to positive iris   gt      mutate if is numeric   scale       gt     mutate infinities   Sepal Length   0    gt     mutate if is numeric  list  na if abs     Inf      For one line  this works     mutate if is numeric   ifelse abs       Inf NA

User · Answer

Also  if someone need the Infs  coordinates  can do this  library rlist  list clean apply df  2  function x  which is infinite x      function x  length x     0L  TRUE   Result   colname1  1  row1 row2      colname2  2  row1 row2       With this information  you can replace the Inf values in particular places with the mean  median  or whatever operator that you want  For example  for element 01   repInf   list clean apply df  2  function x  which is infinite x      function x  length x     0L  TRUE  df repInf  1    names repInf   1      median or mean is finite df   names repInf   1      na rm   TRUE   In loop  for  nonInf in 1 length repInf     df repInf  nonInf    names repInf   nonInf      mean is finite df    names repInf   nonInf

User · Answer

Another solution       dat  lt - data frame a   rep c 1 Inf   1e6   b   rep c Inf 2   1e6                          c   rep c  a   b   1e6  d   rep c 1 Inf   1e6                           e   rep c Inf 2   1e6       system time dat dat  Inf   lt - NA       user  system elapsed    0 316   0 024   0 340

User · Answer

Option 1  Use the fact that a data frame is a list of columns  then use do call to recreate a data frame   do call data frame lapply DT  function x  replace x  is infinite x  NA      Option 2 -- data table  You could use data table and set  This avoids some internal copying   DT  lt - data table dat  invisible lapply names DT  function  name  set DT  which is infinite DT   name      j    name value  NA      Or using column numbers  possibly faster if there are a lot of columns    for  j in 1 ncol DT   set DT  which is infinite DT  j      j  NA    Timings    some  big ish   data dat  lt - data frame a   rep c 1 Inf   1e6   b   rep c Inf 2   1e6                      c   rep c  a   b   1e6  d   rep c 1 Inf   1e6                       e   rep c Inf 2   1e6     create data table library data table  DT  lt - data table dat     replace   mnel  system time na dat  lt - do call data frame lapply dat  function x  replace x  is infinite x  NA        user  system elapsed     0 52    0 01    0 53     is na   dwin  system time is na dat   lt - sapply dat  is infinite     user  system elapsed    32 96    0 07   33 12     modified is na system time is na dat   lt - do call cbind lapply dat  is infinite       user  system elapsed    1 22    0 38    1 60      data table   mnel  system time invisible lapply names DT  function  name  set DT  which is infinite DT   name      j    name value  NA       user  system elapsed    0 29    0 02    0 31    data table is the quickest  Using sapply slows things down noticeably

User · Answer

lt - with mapply is a bit faster than sapply    gt  dat mapply is infinite  dat    lt - NA   With mnel s data  the timing is   gt  system time dat mapply is infinite  dat    lt - NA      user  system elapsed    15 281   0 000  13 750

User · Answer

Use sapply and is na lt -   gt  dat  lt - data frame a c 1  Inf   b c Inf  3   d c  a   b     gt  is na dat   lt - sapply dat  is infinite   gt  dat    a  b d 1  1 NA a 2 NA  3 b   Or you can use  giving credit to  mnel  whose edit this is     gt  is na dat   lt - do call cbind lapply dat  is infinite     which is significantly faster

User · Answer

Here is a dplyr tidyverse solution using the na if   function   dat   gt   mutate if is numeric  list  na if    Inf      Note that this only replaces positive infinity with NA  Need to repeat if negative infinity values also need to be replaced    dat   gt   mutate if is numeric  list  na if    Inf      gt      mutate if is numeric  list  na if    -Inf

[r] Cleaning `Inf` values from an R dataframe

Examples related to r

Examples related to dataframe

Examples related to data.table