Can dplyr package be used for conditional mutating

Question

Can the mutate be used when the mutation is conditional  depending on the values of certain column values    This example helps showing what I mean   structure list a   c 1  3  4  6  3  2  5  1   b   c 1  3  4   2  6  7  2  6   c   c 6  3  6  5  3  6  5  3   d   c 6  2  4   5  3  7  2  6   e   c 1  2  4  5  6  7  6  3   f   c 2  3  4   2  2  7  5  2     Names   c  a    b    c    d    e    f    row names   c NA   8L   class    data frame      a b c d e f 1 1 1 6 6 1 2 2 3 3 3 2 2 3 3 4 4 6 4 4 4 4 6 2 5 5 5 2 5 3 6 3 3 6 2 6 2 7 6 7 7 7 7 5 2 5 2 6 5 8 1 6 3 6 3 2   I was hoping to find a solution to my problem using the dplyr package  and yes I know this not code that should work  but I guess it makes the purpose clear  for creating a new column g    library dplyr   df  lt - mutate df           if  a    2   a    5   a    7    a    1  amp  b    4   g   2            if  a    0   a    1   a    4   a    3    c    4   g   3     The result of the code I am looking for should have this result in this particular example     a b c d e f  g 1 1 1 6 6 1 2  3 2 3 3 3 2 2 3  3 3 4 4 6 4 4 4  3 4 6 2 5 5 5 2 NA 5 3 6 3 3 6 2 NA 6 2 7 6 7 7 7  2 7 5 2 5 2 6 5  2 8 1 6 3 6 3 2  3   Does anyone have an idea about how to do this in dplyr  This data frame is just an example  the data frames I am dealing with are much larger  Because of its speed I tried to use dplyr  but perhaps there are other  better ways to handle this problem

User · Answer

case when is now a pretty clean implementation of the SQL-style case when   structure list a   c 1  3  4  6  3  2  5  1   b   c 1  3  4   2  6  7  2  6   c   c 6  3  6  5  3  6  5  3   d   c 6  2  4   5  3  7  2  6   e   c 1  2  4  5  6  7  6  3   f   c 2  3  4   2  2  7  5  2     Names   c  a    b    c    d    e    f    row names   c NA   8L   class    data frame   - gt  df   df   gt        mutate  g   case when                  a    2   a    5   a    7    a    1  amp  b    4           2                  a    0   a    1   a    4    a    3   c    4           3      Using dplyr 0 7 4  The manual  http   dplyr tidyverse org reference case when html

User · Answer

dplyr now has a function case when that offers a vectorised if  The syntax is a little strange compared to mosaic   derivedFactor as you cannot access variables in the standard dplyr way  and need to declare the mode of NA  but it is considerably faster than mosaic   derivedFactor    df   gt   mutate g   case when a  in  c 2 5 7     a  1  amp  b  4    2L                        a  in  c 0 1 3 4    c    4   3L                        TRUE as integer NA      EDIT  If you re using dplyr  case when   from before version 0 7 0 of the package  then you need to precede variable names with       e g  write   a    1 inside case when    Benchmark  For the benchmark  reusing functions from Arun  s post  and reducing sample size   require data table   require mosaic   require dplyr  require microbenchmark   set seed 42    To recreate the dataframe DT  lt - setDT lapply 1 6  function x  sample 7  10000  TRUE    setnames DT  letters 1 6   DF  lt - as data frame DT   DPLYR case when  lt - function DF      DF   gt     mutate g   case when a  in  c 2 5 7     a  1  amp  b  4    2L                          a  in  c 0 1 3 4    c  4   3L                          TRUE as integer NA       DT fun  lt - function DT      DT  a  in  c 0 1 3 4    c    4   g    3L    DT a  in  c 2 5 7     a  1  amp  b  4   g    2L     DPLYR fun  lt - function DF      mutate DF  g   ifelse a  in  c 2 5 7     a  1  amp  b  4   2L                       ifelse a  in  c 0 1 3 4    c  4  3L  NA integer        mosa fun  lt - function DF      mutate DF  g   derivedFactor       2     a    2   a    5   a    7    a    1  amp  b    4         3     a    0   a    1   a    4   a    3    c    4        method    first        default   NA         perf results  lt - microbenchmark    dt fun  lt - DT fun copy DT      dplyr ifelse  lt - DPLYR fun copy DF      dplyr case when  lt - DPLYR case when copy DF      mosa  lt - mosa fun copy DF      times   100L     This gives   print perf results  Unit  milliseconds            expr        min         lq       mean     median         uq        max neval          dt fun   1 391402    1 560751   1 658337   1 651201   1 716851   2 383801   100    dplyr ifelse   1 172601    1 230351   1 331538   1 294851   1 390351   1 995701   100 dplyr case when   1 648201    1 768002   1 860968   1 844101   1 958801   2 207001   100            mosa 255 591301  281 158350 291 391586 286 549802 292 101601 545 880702   100

User · Answer

Use ifelse df   gt     mutate g   ifelse a    2   a    5   a    7    a    1  amp  b    4   2                 ifelse a    0   a    1   a    4   a    3    c    4  3  NA     Added - if else  Note that in dplyr 0 5 there is an if else function defined so an alternative would be to replace ifelse with if else  however  note that since if else is stricter than ifelse  both legs of the condition must have the same type  so the NA in that case would have to be replaced with NA real    df   gt     mutate g   if else a    2   a    5   a    7    a    1  amp  b    4   2                 if else a    0   a    1   a    4   a    3    c    4  3  NA real      Added - case when Since this question was posted dplyr has added case when so another alternative would be  df   gt   mutate g   case when a    2   a    5   a    7    a    1  amp  b    4    2                              a    0   a    1   a    4   a    3    c    4   3                              TRUE   NA real     Added - arithmetic na if  If the values are numeric and the conditions  except for the default value of NA at the end  are mutually exclusive  as is the case in the question  then we can use an arithmetic expression such that each term is multiplied by the desired result using na if at the end to replace 0 with NA  df   gt     mutate g   2    a    2   a    5   a    7    a    1  amp  b    4                  3    a    0   a    1   a    4   a    3    c    4            g   na if g  0

User · Answer

The derivedFactor function from mosaic package seems to be designed to handle this   Using this example  it would look like   library dplyr  library mosaic  df  lt - mutate df  g   derivedFactor        2     a    2   a    5   a    7    a    1  amp  b    4          3     a    0   a    1   a    4   a    3    c    4         method    first         default   NA            If you want the result to be numeric instead of a factor  you can wrap derivedFactor in an as numeric call    derivedFactor can be used for an arbitrary number of conditionals  too

User · Answer

Since you ask for other better ways to handle the problem  here s another way using data table   require data table     1 9 2  setDT df  df a  in  c 0 1 3 4    c    4  g    3L  df a  in  c 2 5 7     a  1  amp  b  4   g    2L    Note the order of conditional statements is reversed to get g correctly  There s no copy of g made  even during the second assignment - it s replaced in-place   On larger data this would have better performance than using nested if-else  as it can evaluate both  yes  and  no  cases  and nesting can get harder to read maintain IMHO     Here s a benchmark on relatively bigger data      R version 3 1 0 require data table     1 9 2 require dplyr  DT  lt - setDT lapply 1 6  function x  sample 7  1e7  TRUE    setnames DT  letters 1 6      gt  dim DT      1  10000000        6 DF  lt - as data frame DT   DT fun  lt - function DT        DT  a  in  c 0 1 3 4    c    4   g    3L      DT a  in  c 2 5 7     a  1  amp  b  4   g    2L     DPLYR fun  lt - function DF        mutate DF  g   ifelse a  in  c 2 5 7     a  1  amp  b  4   2L               ifelse a  in  c 0 1 3 4    c  4  3L  NA integer        BASE fun  lt - function DF      R v3 1 0     transform DF  g   ifelse a  in  c 2 5 7     a  1  amp  b  4   2L               ifelse a  in  c 0 1 3 4    c  4  3L  NA integer        system time ans1  lt - DT fun DT       user  system elapsed     2 659   0 420   3 107   system time ans2  lt - DPLYR fun DF       user  system elapsed    11 822   1 075  12 976   system time ans3  lt - BASE fun DF       user  system elapsed    11 676   1 530  13 319   identical as data frame ans1   as data frame ans2      1  TRUE  identical as data frame ans1   as data frame ans3      1  TRUE   Not sure if this is an alternative you d asked for  but I hope it helps

[r] Can dplyr package be used for conditional mutating?

Examples related to r

Examples related to if-statement

Examples related to dplyr

Examples related to case-when

Examples related to mutate