Case Statement Equivalent in R

Question

I have a variable in a dataframe where one of the fields typically has 7-8 values   I want to collpase them 3 or 4 new categories within a new variable within the dataframe   What is the best approach    I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R   Any help you can provide will be much appreciated

User · Answer

You can use the base function merge for case-style remapping tasks  df  lt - data frame name   c  cow   pig   eagle   pigeon   cow   eagle                      stringsAsFactors   FALSE   mapping  lt - data frame    name c  cow   pig   eagle   pigeon      category c  mammal   mammal   bird   bird      merge df mapping    name category   1    cow   mammal   2    cow   mammal   3  eagle     bird   4  eagle     bird   5    pig   mammal   6 pigeon     bird

User · Answer

case when    which was added to dplyr in May 2016  solves this problem in a manner similar to memisc  cases     For example   library dplyr  mtcars   gt      mutate category   case when        cyl    4  amp    disp  lt  median   disp     4 cylinders  small displacement         cyl    8  amp    disp  gt  median   disp     8 cylinders  large displacement       TRUE    other          As of dplyr 0 7 0   mtcars   gt      mutate category   case when      cyl    4  amp  disp  lt  median disp     4 cylinders  small displacement       cyl    8  amp  disp  gt  median disp     8 cylinders  large displacement       TRUE    other

User · Answer

I am using in those cases you are referring switch    It looks like a control statement but actually  it is a function  The expression is evaluated and based on this value  the corresponding item in the list is returned       switch works in two distinct ways depending whether the first argument evaluates to a character string or a number    What follows is a simple string example which solves your problem to collapse old categories to new ones       For the character-string form  have a single unnamed argument as the default after the named values     newCat  lt - switch EXPR   category         cat1     catX         cat2     catX         cat3     catY         cat4     catY         cat5     catZ         cat6     catZ          not available

User · Answer

I see no proposal for  switch   Code example  run it   x  lt -  quot three quot  y  lt - 0 switch x         one    y  lt - 5          two    y  lt - 12          three    y  lt - 432   y

User · Answer

A case statement actually might not be the right approach here   If this is a factor  which is likely is  just set the levels of the factor appropriately   Say you have a factor with the letters A to E  like this    gt  a  lt - factor rep LETTERS 1 5  2    gt  a   1  A B C D E A B C D E Levels  A B C D E   To join levels B and C and name it BC  just change the names of those levels to BC    gt  levels a   lt - c  A   BC   BC   D   E    gt  a   1  A  BC BC D  E  A  BC BC D  E  Levels  A BC D E   The result is as desired

User · Answer

If you want to have sql-like syntax you can just make use of sqldf package  Tthe function to be used is also names sqldf and the syntax is as follows  sqldf  lt your query in quotation marks gt

User · Answer

There is a switch statement but I can never seem to get it to work the way I think it should  Since you have not provided an example I will make one using a factor variable    dft  lt -data frame x   sample letters 1 8   20  replace TRUE    levels dft x   1   a   b   c   d   e   f   g   h    If you specify the categories you want in an order appropriate to the reassignment you can use the factor or numeric variables as an index   c  abc    abc    abc    def    def    def    g    h   dft x    1   def   h     g     def   def   abc   h     h     def   abc   abc   abc   h     h     abc   16   def   abc   abc   def   def   dft y  lt - c  abc    abc    abc    def    def    def    g    h   dft x  str dft   data frame     20 obs  of  2 variables     x  Factor w  8 levels  a   b   c   d      4 8 7 4 6 1 8 8 5 2        y  chr   def   h   g   def        I later learned that there really are two different switch functions  It s not generic function but you should think about it as either switch numeric or switch character  If your first argument is an R  factor   you get switch numeric behavior  which is likely to cause problems  since most people see factors displayed as character and make the incorrect assumption that all functions will process them as such

User · Answer

Have a look at the cases function from the memisc package  It implements case-functionality with two different ways to use it  From the examples in the package   z1 cases       Condition 1  x lt 0       Condition 2  y lt 0   only applies if x  gt   0      Condition 3  TRUE         where x and y are two vectors   References  memisc package  cases example

User · Answer

i dont like any of these  they are not clear to the reader or the potential user  I just use an anonymous function  the syntax is not as slick as a case statement  but the evaluation is similar to a case statement and not that painful  this also assumes your evaluating it within where your variables are defined   result  lt -   function     if  x  10   y lt  5  return  foo                             if  x  11  amp  y   5  return  bar                                  all of those    are necessary to enclose and evaluate the anonymous function

User · Answer

Imho  most straightforward and universal code   dft data frame x   sample letters 1 8   20  replace TRUE   dft within dft       y NA     y x  in  c  a   b   c     abc      y x  in  c  d   e   f     def      y x  in   g    g      y x  in   h    h

User · Answer

As of data table v1 13 0 you can use the function fcase    fast-case  to do SQL-like CASE operations  also similar to dplyr  case when     require data table   dt  lt - data table name   c  cow   pig   eagle   pigeon   cow   eagle    dt    category    fcase name  in  c  cow    pig     mammal                           name  in  c  eagle    pigeon     bird

User · Answer

If you got factor then you could change levels by standard method   df  lt - data frame name   c  cow   pig   eagle   pigeon                  stringsAsFactors   FALSE  df type  lt - factor df name    First step  copy vector and make it factor   Change levels  levels df type   lt - list      animal   c  cow    pig        bird   c  eagle    pigeon     df       name   type   1    cow animal   2    pig animal   3  eagle   bird   4 pigeon   bird   You could write simple function as a wrapper   changelevels  lt - function f             f  lt - as factor f      levels f   lt - list          f    df  lt - data frame name   c  cow   pig   eagle   pigeon                      stringsAsFactors   TRUE   df type  lt - changelevels df name  animal c  cow    pig    bird c  eagle    pigeon

User · Answer

Here s a way using the switch statement   df  lt - data frame name   c  cow   pig   eagle   pigeon                      stringsAsFactors   FALSE  df type  lt - sapply df name  switch                     cow    animal                      pig    animal                      eagle    bird                      pigeon    bird     gt  df     name   type 1    cow animal 2    pig animal 3  eagle   bird 4 pigeon   bird   The one downside of this is that you have to keep writing the category name  animal  etc  for each item  It is syntactically more convenient to be able to define our categories as below  see the very similar question How do add a column in a data frame in R    myMap  lt - list animal   c  cow    pig    bird   c  eagle    pigeon      and we want to somehow  invert  this mapping  I write my own invMap function   invMap  lt - function map      items  lt - as character  unlist map      nams  lt - unlist Map rep  names map   sapply map  length      names nams   lt - items   nams     and then invert the above map as follows    gt  invMap myMap       cow      pig    eagle   pigeon   animal   animal     bird     bird     And then it s easy to use this to add the type column in the data-frame   df  lt - transform df  type   invMap myMap  name     gt  df     name   type 1    cow animal 2    pig animal 3  eagle   bird 4 pigeon   bird

User · Answer

Mixing plyr  mutate  and  dplyr  case when works for me and is readable   iris   gt   plyr  mutate coolness        dplyr  case when Species      setosa         not cool                         Species      versicolor     not cool                         Species      virginica      super awesome                         TRUE                        undetermined            - gt  testIris head testIris  levels testIris coolness      NULL testIris coolness  lt - as factor testIris coolness  levels testIris coolness      ok now testIris 97 103 4 6    Bonus points if the column can come out of mutate as a factor instead of char   The last line of the case when statement  which catches all un-matched rows is very important        Petal Width    Species      coolness  97         1 3  versicolor      not cool  98         1 3  versicolor      not cool    99         1 1  versicolor      not cool 100         1 3  versicolor      not cool 101         2 5  virginica     super awesome 102         1 9  virginica     super awesome 103         2 1  virginica     super awesome

User · Answer

You can use recode from the car package   library ggplot2   get data library car  daimons new var  lt - recode diamonds clarity     I1     low   SI2     low  else    high     1 10

[r] Case Statement Equivalent in R

Examples related to r

Examples related to case