Replace missing values with column mean

Question

I am not sure how to loop over each column to replace the NA values with the column mean  When I am trying to replace for one column using the following  it works well   Column1 is na Column1    lt - round mean Column1  na rm   TRUE     The code for looping over columns is not working   for i in 1 ncol data        data i  is na data i     lt - round mean data i   na rm   TRUE       the values are not replaced  Can someone please help me with this

User · Answer

With the data table package you could use the set   function and loop over the columns and replace the NAs or whatever you like with an aggregate or value of your choice  here  mean   require data table     data dt   copy iris   -5   setDT dt  dt 1 4  Sepal Length    NA    introduce NAs    replace NAs with mean  or whatever function you like  for  j in seq along names dt        set dt        i   which is na dt  j            j   j         value   mean dt  j    na rm   TRUE

User · Answer

A one-liner using tidyr s replace na is library tidyr  replace na mtcars as list colMeans mtcars na rm T     If your df has columns that are non-numeric  this takes a little bit more work than a one-liner  mean to fill  lt - select if ungroup df   is numeric    gt    colMeans na rm T   bind cols select df  group1  group2  group3             select if ungroup df   is numeric    gt                tidyr  replace na as list mean to fill

User · Answer

lapply can be used instead of a for loop   d1    lt - lapply d1  function x  ifelse is na x   mean x  na rm   TRUE   x     This doesn t really have any advantages over the for loop  though maybe it s easier if you have non-numeric columns as well  in which case  d1 sapply d1  is numeric    lt - lapply d1 sapply d1  is numeric    function x  ifelse is na x   mean x  na rm   TRUE   x     is almost as easy

User · Answer

Similar to the answer pointed out by  Thomas  This can also be done using ifelse   method of R   for i in 1 ncol data      data  i  ifelse is na data  i                      ave data  i  FUN function y  mean y  na rm   TRUE                      data  i       where   Arguments to ifelse TEST  YES   NO  are -  TEST- logical condition to be checked  YES- executed if the condition is True  NO- else when the condition is False  and ave x       FUN   mean  is method in R used for calculating averages of subsets of x

User · Answer

If DF is your data frame of numeric columns   library zoo  na aggregate DF    ADDED   Using only the base of R define a function which does it for one column and then lapply to every column   NA2mean  lt - function x  replace x  is na x   mean x  na rm   TRUE   replace DF  TRUE  lapply DF  NA2mean     The last line could be replaced with the following if it s OK to overwrite the input   DF    lt - lapply DF  NA2mean

User · Answer

dplyr s mutate all or mutate at could be useful here   library dplyr                                                                set seed 10                                                                 df  lt - data frame a   sample c NA  1 3       replace   TRUE  10                               b   sample c NA  101 103   replace   TRUE  10                                                c   sample c NA  201 203   replace   TRUE  10                                df             gt      a   b   c   gt  1   2 102 203   gt  2   1 102 202   gt  3   1  NA 203   gt  4   2 102 201   gt  5  NA 101 201   gt  6  NA 101 202   gt  7   1  NA 203   gt  8   1 101  NA   gt  9   2 101 203   gt  10  1 103 201  df   gt   mutate all  ifelse is na  x   mean  x  na rm   TRUE    x                gt         a       b        c   gt  1  2 000 102 000 203 0000   gt  2  1 000 102 000 202 0000   gt  3  1 000 101 625 203 0000   gt  4  2 000 102 000 201 0000   gt  5  1 375 101 000 201 0000   gt  6  1 375 101 000 202 0000   gt  7  1 000 101 625 203 0000   gt  8  1 000 101 000 202 1111   gt  9  2 000 101 000 203 0000   gt  10 1 000 103 000 201 0000  df   gt   mutate at vars a  b   ifelse is na  x   mean  x  na rm   TRUE    x      gt         a       b   c   gt  1  2 000 102 000 203   gt  2  1 000 102 000 202   gt  3  1 000 101 625 203   gt  4  2 000 102 000 201   gt  5  1 375 101 000 201   gt  6  1 375 101 000 202   gt  7  1 000 101 625 203   gt  8  1 000 101 000  NA   gt  9  2 000 101 000 203   gt  10 1 000 103 000 201

User · Answer

Go simply with Zoo  it will simply replace all NA values with mean of the column values   library zoo  na aggregate data

User · Answer

A relatively simple modification of your code should solve the issue   for i in 1 ncol data      data is na data  i    i   lt - mean data  i   na rm   TRUE

User · Answer

Lets say I have a dataframe   df as following - df  lt - data frame a c 2 3 4 NA 5 NA  b c 1 2 3 4 NA NA      create a custom function fillNAwithMean  lt - function x       na index  lt - which is na x               mean x  lt - mean x  na rm T      x na index   lt - mean x     return x      df  lt - apply df 2 fillNAwithMean      a   b 2 0 1 0 3 0 2 0 4 0 3 0 3 5 4 0 5 0 2 5 3 5 2 5

User · Answer

You could also try    cM  lt - colMeans d1  na rm TRUE   indx  lt - which is na d1   arr ind TRUE   d1 indx   lt - cM indx  2    d1     data  set seed 42  d1  lt - as data frame matrix sample c NA 0 5   5 10  replace TRUE   ncol 10

User · Answer

To add to the alternatives  using  akrun s sample data  I would do the following   d1    lt - lapply d1  function x       x is na x    lt - mean x  na rm   TRUE    x    d1

User · Answer

There is also quick solution using the imputeTS package   library imputeTS  na mean yourDataFrame

[r] Replace missing values with column mean

Examples related to r

Examples related to missing-data

Examples related to imputation