dplyr mutate with conditional values

Question

In a large dataframe   myfile   with four columns I have to add a fifth column with values conditionally based on the first four columns   Prefer answers with dplyr and mutate  mainly because of its speed in large datasets   My dataframe looks like this     V1 V2 V3 V4 1  1  2  3  5 2  2  4  4  1 3  1  4  1  1 4  4  5  1  3 5  5  5  5  4       The values of the fifth column  V5  are based on some conditional rules   if  V1  1  amp  V2  4      V5  lt - 1   else if  V2  4  amp  V3  1      V5  lt - 2   else     V5  lt - 0     Now I want to use the mutate function to use these rules on all rows  to avoid slow loops   Something like this  and yes  I know it doesn t work this way     myfile  lt - mutate myfile  if  V1  1  amp  V2  4  V5   1      else if  V2  4  amp  V3  1  V5   2      else  V5   0     This should be the result     V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0   How to do this in dplyr

User · Answer

With dplyr 0 7 2  you can use the very useful case when function    x read table   text  V1 V2 V3 V4  1  1  2  3  5  2  2  4  4  1  3  1  4  1  1  4  4  5  1  3  5  5  5  5  4   x V5   case when x V1  1  amp  x V2  4   1                   x V2  4  amp  x V3  1   2                   TRUE   0    Expressed with dplyr  mutate  it gives   x   x   gt   mutate       V5   case when           V1  1  amp  V2  4   1           V2  4  amp  V3  1   2           TRUE   0            Please note that NA are not treated specially  as it can be misleading  The function will return NA only when no condition is matched  If you put a line with TRUE        like I did in my example  the return value will then never be NA    Therefore  you have to expressively tell case when to put NA where it belongs by adding a statement like is na x V1    is na x V3    NA integer   Hint  the dplyr  coalesce   function can be really useful here sometimes   Moreover  please note that NA alone will usually not work  you have to put special NA values   NA integer   NA character  or NA real

User · Answer

It looks like derivedFactor from the mosaic package was designed for this   In this example  it would look something like   library mosaic  myfile  lt - mutate myfile  V5   derivedFactor       1     V1  1  amp  V2  4        2     V2  4  amp  V3  1        method    first        default   0           If you want the outcome to be numeric instead of a factor  wrap the derivedFactor with an as numeric    Note that the  default option combined with  method    first  sets the  else  condition -- this approach is described in the help file for derivedFactor

User · Answer

Try this   myfile   gt   mutate V5    V1    1  amp  V2    4    2    V2    4  amp  V3    1     giving     V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0   or this   myfile   gt   mutate V5   ifelse V1    1  amp  V2    4  1  ifelse V2    4  amp  V3    1  2  0      giving     V1 V2 V3 V4 V5 1  1  2  3  5  1 2  2  4  4  1  2 3  1  4  1  1  0 4  4  5  1  3  0 5  5  5  5  4  0   Note  Suggest you get a better name for your data frame   myfile makes it seem as if it holds a file name   Above used this input   myfile  lt -  structure list V1   c 1L  2L  1L  4L  5L   V2   c 2L  4L  4L   5L  5L   V3   c 3L  4L  1L  1L  5L   V4   c 5L  1L  1L  3L  4L      Names   c  V1    V2    V3    V4    class    data frame   row names   c  1     2    3    4    5      Update 1  Since originally posted dplyr has changed     to   gt   so have modified answer accordingly   Update 2 dplyr now has case when which provides another solution   myfile   gt           mutate V5   case when V1    1  amp  V2    4   1                                V2    4  amp  V3    1   2                               TRUE   0

[r] dplyr mutate with conditional values

Examples related to r

Examples related to dplyr

Examples related to mutate