Change the Blank Cells to NA

Question

Here s the link of my data    My target is to assign  NA  to all blank cells irrespective of categorical or numerical values  I am using na strings     But it s not assigning NA to all blank cells      reading the data dat  lt - read csv  data2 csv   head dat    mon hr        acc   alc sex spd axles door  reg                                 cond1 drug1 1   8 21 No Control  TRUE   F   0     2    2      Physical Impairment  Eyes  Ear  Limb      A 2   7 20 No Control FALSE   M 900     2    2                                Inattentive     D 3   3  9 No Control FALSE   F 100     2    2 2004                                Normal     D 4   1 15 No Control FALSE   M   0     2    2      Physical Impairment  Eyes  Ear  Limb      D 5   4 21 No Control FALSE      25    NA   NA                                                D 6   4 20 No Control    NA   F  30     2    4                Drinking Alcohol - Impaired     D        inj1 PED STATE st rac1 1     Fatal       lt NA gt   F  lt NA gt  2  Moderate       lt NA gt   F  lt NA gt  3  Moderate       lt NA gt   M  lt NA gt  4 Complaint       lt NA gt   M  lt NA gt  5 Complaint       lt NA gt   F  lt NA gt  6  Moderate       lt NA gt   M  lt NA gt       using na strings dat2  lt - read csv  data2 csv   header T  na strings     head dat2    mon hr        acc   alc sex spd axles door  reg                                 cond1 drug1 1   8 21 No Control  TRUE   F   0     2    2  lt NA gt  Physical Impairment  Eyes  Ear  Limb      A 2   7 20 No Control FALSE   M 900     2    2  lt NA gt                            Inattentive     D 3   3  9 No Control FALSE   F 100     2    2 2004                                Normal     D 4   1 15 No Control FALSE   M   0     2    2  lt NA gt  Physical Impairment  Eyes  Ear  Limb      D 5   4 21 No Control FALSE      25    NA   NA  lt NA gt                                    lt NA gt      D 6   4 20 No Control    NA   F  30     2    4  lt NA gt            Drinking Alcohol - Impaired     D        inj1 PED STATE st rac1 1     Fatal        NA  F   NA 2  Moderate        NA  F   NA 3  Moderate        NA  M   NA 4 Complaint        NA  M   NA 5 Complaint        NA  F   NA 6  Moderate        NA  M   NA

User · Answer

While many options above function well  I found coercion of non-target variables to chr problematic  Using ifelse and grepl within lapply resolves this off-target effect  in limited testing   Using slarky s regular expression in grepl   set seed 42  x1  lt - sample c  a   b        a a   NA   10  TRUE  x2  lt - sample c rnorm length x1  0  1   NA   length x1   TRUE   df  lt - data frame x1  x2  stringsAsFactors   FALSE    The problem of coercion to character class   df2  lt - lapply df  function x  gsub           NA  x   lapply df2  class     x1  1   character    x2  1   character   Resolution with use of ifelse   df3  lt - lapply df  function x  ifelse grepl           x   TRUE  NA  x   lapply df3  class     x1  1   character    x2  1   numeric

User · Answer

My function takes into account factor  character vector and potential attributes  if you use haven or foreign package to read external files  Also it allows matching different self-defined na strings  To transform all columns  simply use lappy  df     lapply df  blank2na  na strings c     NA   na   N A   n a   NaN   nan     See more the comments       Replaces blank-ish elements of a factor or character vector to NA     description Replaces blank-ish elements of a factor or character vector to NA     param x a vector of factor or character or any type     param na strings case sensitive strings that will be coverted to NA  The function will do a trimws x  both   before conversion  If NULL  do only trimws  no conversion to NA      return Returns a vector trimws  always for factor  character  and NA converted  if matching na strings   Attributes will also be kept   label   labels    value labels        seealso  code  link ez nan2na       export blank2na   function x na strings c         NA   na   N A   n a   NaN   nan          if  is factor x             lab   attr x   label   exact   T          labs1  lt - attr x   labels   exact   T          labs2  lt - attr x   value labels   exact   T             trimws will convert factor to character         x   trimws x  both           if    is null lab   lab   trimws lab  both           if    is null labs1   labs1   trimws labs1  both           if    is null labs2   labs2   trimws labs2  both            if   is null na strings                   convert to NA             x x  in  na strings    NA               also remember to remove na strings from value labels              labs1   labs1   labs1  in  na strings              labs2   labs2   labs2  in  na strings                       the levels will be reset here         x   factor x           if    is null lab   attr x   label    lt - lab         if    is null labs1   attr x   labels    lt - labs1         if    is null labs2   attr x   value labels    lt - labs2       else if  is character x             lab   attr x   label   exact   T          labs1  lt - attr x   labels   exact   T          labs2  lt - attr x   value labels   exact   T             trimws will convert factor to character         x   trimws x  both           if    is null lab   lab   trimws lab  both           if    is null labs1   labs1   trimws labs1  both           if    is null labs2   labs2   trimws labs2  both            if   is null na strings                   convert to NA             x x  in  na strings    NA               also remember to remove na strings from value labels              labs1   labs1   labs1  in  na strings              labs2   labs2   labs2  in  na strings                     if    is null lab   attr x   label    lt - lab         if    is null labs1   attr x   labels    lt - labs1         if    is null labs2   attr x   value labels    lt - labs2       else           x   x           return x

User · Answer

Couldn t you just use  dat  lt - read csv  data2 csv  na strings     header TRUE    should convert all blanks to NA as the data are read in be sure to put a space between your quotation

User · Answer

Call dplyr package by installing from cran in r  library dplyr    file   colname  lt -sub  -  NA file colname     It will convert all the blank cell in a particular column as NA  If the column contains  -       0 like this change it in code according to the type of blank cell  E g  if I get a blank cell like    instead of  -   then use this code    file   colname  lt -sub     NA  file colname

User · Answer

You can use gsub to replace multiple mutations of empty  like    or a space  to be NA   data  data frame cats c           meow    dogs c  woof        NA   apply data  2  function x  gsub           NA  x

User · Answer

I recently ran into similar issues  and this is what worked for me    If the variable is numeric  then a simple df Var df Var         lt - NA should suffice  But if the variable is a factor  then you need to convert it to the character first  then replace    cells with the value you want  and convert it back to factor  So case in point  your Sex variable  I assume it would be a factor and if you want to replace the empty cell  I would do the following   df Var  lt - as character df Var  df Var df Var       lt - NA df Var  lt - as factor df Var

User · Answer

I suspect everyone has an answer already  though in case someone comes looking  dplyr na if   would be  from my perspective  the more efficient of those mentioned    Import CSV  convert all  blank  cells to NA dat  lt - read csv  quot data2 csv quot     gt   na if  quot  quot    Here is an additional approach leveraging readr s read delim function  I just picked-up  probably widely know  but I ll archive here for future users   This is very straight forward and more versatile than the above  as you can capture all types of blank and NA related values in your csv file  dat  lt - read csv  quot data2 csv quot   na   c  quot  quot    quot NA quot    quot N A quot     Note the underscore in readr s version versus Base R  quot   quot  in read csv  Hopefully this helps someone who wanders upon the post

User · Answer

I m assuming you are talking about row 5 column  sex   It could be the case that in the data2 csv file  the cell contains a space and hence is not considered empty by R   Also  I noticed that in row 5 columns  axles  and  door   the original values read from data2 csv are string  NA   You probably want to treat those as na strings as well  To do this    dat2  lt - read csv  data2 csv   header T  na strings c     NA      EDIT   I downloaded your data2 csv  Yes  there is a space in row 5 column  sex   So you want  na strings c         NA

User · Answer

This works for me  dataset  lt - read csv file    quot data csv quot  header TRUE fill   T na strings    quot  quot

User · Answer

A more eye-friendly solution using dplyr would be  require dplyr      fake blank cells iris 1 1         define a helper function empty as na  lt - function x       if  factor   in  class x   x  lt - as character x     since ifelse wont work with factors     ifelse as character x       x  NA        transform all columns iris   gt   mutate each funs empty as na      To apply the correction to just a subset of columns you can specify columns of interest using dplyr s column matching syntax  Example mutate each funs empty as na   matches  Width    Species   In case you table contains dates you should consider using a more typesafe version of ifelse

User · Answer

As of  dplyr 1 0 0  we can use across   For all columns  dat  lt - dat   gt      mutate across everything     ifelse     quot  quot   NA  as character        For individual columns  dat  lt - dat   gt      mutate across c  quot Age quot   quot Gender quot     ifelse     quot  quot   NA  as character         As of  dplyr 0 8 0 above  the way this should be written has changed  Before it was  funs   in  funs  funs name   f      Instead of funs  now we use list  list name    f      Note that there is also a much simpler way to list the column names    both the name of the column and column index work  dat  lt - dat   gt   mutate at  vars   c  quot Age quot   quot Gender quot         funs   list  ifelse     quot  quot   NA  as character         Original Answer  You can also use mutate at in dplyr dat  lt - dat   gt   mutate at vars colnames               funs   funs ifelse     quot  quot   NA  as character        Select individual columns to change  dat  lt - dat   gt   mutate at vars colnames    names     in  c  quot Age quot   quot Gender quot               funs   funs ifelse     quot  quot   NA  as character

User · Answer

This should do the trick  dat  lt - dat   gt   mutate all na if

User · Answer

For those wondering about a solution using the data table way  here is one I wrote a function for  available on my Github   library devtools  source url  https   github com YoannPa Miscellaneous blob master datatable pattern substitution R raw TRUE   dt sub DT   dat2  pattern            replacement   NA  dat2   The function goes through each column  to identify which column contains pattern matches  Then gsub   is aplied only on columns containing matches for the pattern           to substitutes matches by NAs   I will keep improving this function to make it faster

[r] Change the Blank Cells to "NA"

Examples related to r

Examples related to na