Coerce multiple columns to factors at once

Question

I have a sample data frame like below   data  lt - data frame matrix sample 1 40   4  10  dimnames   list 1 4  LETTERS 1 10       I want to know how can I select multiple columns and convert them together to factors  I usually do it in the way like data A   as factor data A   But when the data frame is very large and contains lots of columns  this way will be very time consuming  Does anyone know of a better way to do it

User · Answer

Here is a data table example  I used grep in this example because that s how I often select many columns by using partial matches to their names  library data table  data  lt - data table matrix sample 1 40   4  10  dimnames   list 1 4  LETTERS 1 10      factorCols  lt - grep pattern    quot A C D H quot   x   names data   value   TRUE   data    factorCols     lapply  SD  as factor    SDcols   factorCols

User · Answer

If you have another objective of getting in values from the table then using them to be converted  you can try the following way      pre processing ind  lt - bigm train  lapply  SD is character   ind  lt - names ind   SD T        Convert multiple columns to factor bigm train   ind   lapply  SD factor   SDcols ind    This selects columns which are specifically character based and then converts them to factor

User · Answer

and  for completeness and with regards to this question asking about changing string columns only  there s mutate if   data  lt - cbind stringVar   sample c  foo   bar   10 replace TRUE                 data frame matrix sample 1 40   10  10  dimnames   list 1 10  LETTERS 1 10     stringsAsFactors FALSE        factoredData   data   gt   mutate if is character funs factor

User · Answer

You can use mutate if  dplyr    For example  coerce integer in factor   mydata structure list a   1 10  b   1 10  c   c  a    a    b    b     c    c    c    c    c    c     row names   c NA  -10L   class   c  tbl df     tbl    data frame       A tibble  10 x 3        a     b c         lt int gt   lt int gt   lt chr gt   1     1     1 a      2     2     2 a      3     3     3 b      4     4     4 b      5     5     5 c      6     6     6 c      7     7     7 c      8     8     8 c      9     9     9 c     10    10    10 c      Use the function   library dplyr   mydata  gt       mutate if is integer as factor     A tibble  10 x 3        a     b c         lt fct gt   lt fct gt   lt chr gt   1     1     1 a      2     2     2 a      3     3     3 b      4     4     4 b      5     5     5 c      6     6     6 c      7     7     7 c      8     8     8 c      9     9     9 c     10    10    10 c

User · Answer

Here is an option using dplyr   The   lt  gt   operator from magrittr update the lhs object with the resulting value   library magrittr  library dplyr  cols  lt - c  A    C    D    H    data   lt  gt          mutate each  funs factor     cols  str data    data frame    4 obs  of  10 variables      A  Factor w  4 levels  23   24   26      1 2 3 4     B  int  15 13 39 16     C  Factor w  4 levels  3   5   18   37   2 1 3 4     D  Factor w  4 levels  2   6   28   38   3 1 4 2     E  int  14 4 22 20     F  int  7 19 36 27     G  int  35 40 21 10     H  Factor w  4 levels  11   29   32      1 4 3 2     I  int  17 1 9 25     J  int  12 30 8 33     Or if we are using data table  either use a for loop with set  setDT data  for j in cols     set data  i NULL  j j  value factor data  j         Or we can specify the  cols  in  SDcols  and assign      the rhs to  cols   setDT data     cols    lapply  SD  factor    SDcols cols

User · Answer

Here is another tidyverse approach using the modify at   function from the purrr package  library purrr     Data frame with only integer columns data  lt - data frame matrix sample 1 40   4  10  dimnames   list 1 4  LETTERS 1 10        Modify specified columns to a factor class data with factors  lt - data   gt       purrr  modify at c  quot A quot    quot C quot    quot E quot    factor      Check the results  str data with factors     data frame     4 obs  of  10 variables       A  Factor w  4 levels  quot 8 quot   quot 12 quot   quot 33 quot      1 3 4 2      B  int  25 32 2 19      C  Factor w  4 levels  quot 5 quot   quot 15 quot   quot 35 quot      1 3 4 2      D  int  11 7 27 6      E  Factor w  4 levels  quot 1 quot   quot 4 quot   quot 16 quot   quot 20 quot   2 3 1 4      F  int  21 23 39 18      G  int  31 14 38 26      H  int  17 24 34 10      I  int  13 28 30 29      J  int  3 22 37 9

User · Answer

The more recent tidyverse way is to use the mutate at function   library tidyverse  library magrittr  set seed 88   data  lt - data frame matrix sample 1 40   4  10  dimnames   list 1 4  LETTERS 1 10     cols  lt - c  A    C    D    H    data   lt  gt   mutate at cols  funs factor      str data     A  Factor w  4 levels  5   17   18      2 1 4 3       B  int  36 35 2 26    C  Factor w  4 levels  22   31   32      1 2 4 3    D  Factor w  4 levels  1   9   16   39   3 4 1 2    E  int  3 14 30 38    F  int  27 15 28 37    G  int  19 11 6 21    H  Factor w  4 levels  7   12   20      1 3 4 2    I  int  23 24 13 8    J  int  10 25 4 33

User · Answer

Choose some columns to coerce to factors   cols  lt - c  A    C    D    H     Use lapply   to coerce and replace the chosen columns   data cols   lt - lapply data cols   factor      as factor   could also be used   Check the result   sapply data  class           A         B         C         D         E         F         G     factor   integer    factor    factor   integer   integer   integer            H         I         J     factor   integer   integer

User · Answer

It appears that the use of SAPPLY on a data frame to convert variables to factors at once does not work as it produces a matrix  array  My approach is to use LAPPLY instead  as follows     let us create a data frame here  class  lt - c  quot 7 quot    quot 6 quot    quot 5 quot    quot 3 quot    cash  lt - c 100  200  300  150   height  lt - c 170  180  150  165   people  lt - data frame class  cash  height   class people     This is a dataframe      We now apply lapply to the data frame as follows   bb  lt - lapply people  as factor    gt   data frame        The lapply part returns a list which we coerce back to a data frame  class bb     A data frame    Now let us check the classes of the variables   class bb class   class bb height   class bb cash     as expected  are all factors

[r] Coerce multiple columns to factors at once

Examples related to r

Examples related to dataframe

Examples related to r-factor