How to convert a factor to integer numeric without loss of information

Question

When I convert a factor to a numeric or integer  I get the underlying level codes  not the values as numbers   f  lt - factor sample runif 5   20  replace   TRUE        1  0 0248644019011408 0 0248644019011408 0 179684827337041       4  0 0284090070053935 0 363644931698218  0 363644931698218       7  0 179684827337041  0 249704354675487  0 249704354675487      10  0 0248644019011408 0 249704354675487  0 0284090070053935     13  0 179684827337041  0 0248644019011408 0 179684827337041      16  0 363644931698218  0 249704354675487  0 363644931698218      19  0 179684827337041  0 0284090070053935    5 Levels  0 0248644019011408 0 0284090070053935     0 363644931698218  as numeric f       1  1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2  as integer f       1  1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2   I have to resort to paste to get the real values   as numeric paste f        1  0 02486440 0 02486440 0 17968483 0 02840901 0 36364493 0 36364493      7  0 17968483 0 24970435 0 24970435 0 02486440 0 24970435 0 02840901     13  0 17968483 0 02486440 0 17968483 0 36364493 0 24970435 0 36364493     19  0 17968483 0 02840901   Is there a better way to convert a factor to numeric

User · Answer

Looks like the solution as numeric levels f   f  no longer work with R 4 0   Alternative solution   factor2number  lt - function x       data frame levels x   1 length levels x    row names   1  x  1     factor2number yourFactor

User · Answer

You can use hablar  convert if you have a data frame  The syntax is easy   Sample df  library hablar  library dplyr   df  lt - dplyr  tibble a   as factor c  7    3                         b   as factor c  1 5    6 3       Solution  df   gt      convert num a  b     gives you     A tibble  2 x 2       a     b    lt dbl gt   lt dbl gt  1    7   1 50 2    3   6 30   Or if you want one column to be integer and one numeric   df   gt      convert int a             num b     results in     A tibble  2 x 2       a     b    lt int gt   lt dbl gt  1     7  1 50 2     3  6 30

User · Answer

R has a number of  undocumented  convenience functions for converting factors    as character factor as data frame factor as Date factor as list factor as vector factor       But annoyingly  there is nothing to handle the factor -  numeric conversion  As an extension of Joshua Ulrich s answer  I would suggest to overcome this omission with the definition of your own idiomatic function   as numeric factor  lt - function x   as numeric levels x   x     that you can store at the beginning of your script  or even better in your  Rprofile file

User · Answer

late to the game  accidently  I found trimws   can convert factor 3 5  to c  3   4   5     Then you can call as numeric     That is   as numeric trimws x factor var

User · Answer

The most easiest way would be to use unfactor function from package varhandle which can accept a factor vector or even a dataframe  unfactor your factor variable   This example can be a quick start  x  lt - rep c  quot a quot    quot b quot    quot c quot    20  y  lt - rep c 1  1  0   20   class x     - gt   quot character quot  class y     - gt   quot numeric quot   x  lt - factor x  y  lt - factor y   class x     - gt   quot factor quot  class y     - gt   quot factor quot   library varhandle  x  lt - unfactor x  y  lt - unfactor y   class x     - gt   quot character quot  class y     - gt   quot numeric quot   You can also use it on a dataframe  For example the iris dataset  sapply iris  class    Sepal Length  Sepal Width Petal Length  Petal Width      Species     quot numeric quot      quot numeric quot      quot numeric quot      quot numeric quot       quot factor quot      load the package library  quot varhandle quot     pass the iris to unfactor tmp iris  lt - unfactor iris    check the classes of the columns sapply tmp iris  class    Sepal Length  Sepal Width Petal Length  Petal Width      Species     quot numeric quot      quot numeric quot      quot numeric quot      quot numeric quot    quot character quot      check if the last column is correctly converted tmp iris Species      1   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot         6   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot        11   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot        16   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot        21   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot        26   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot        31   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot    36   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot    41   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot    46   quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot       quot setosa quot    51   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    56   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    61   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    66   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    71   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    76   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    81   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    86   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    91   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot    96   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   quot versicolor quot   101   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   106   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   111   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   116   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   121   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   126   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   131   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   136   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   141   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot   146   quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot    quot virginica quot

User · Answer

Note  this particular answer is not for converting numeric-valued factors to numerics  it is for converting categorical factors to their corresponding level numbers     Every answer in this post failed to generate results for me   NAs were getting generated   y2 lt -factor c  A   B   C   D   A      as numeric levels y2   y2    1  NA NA NA NA NA Warning message  NAs introduced by coercion   What worked for me is this -   as integer y2     1  1 2 3 4 1

User · Answer

From the many answers I could read  the only given way was to expand the number of variables according to the number of factors  If you have a variable  pet  with levels  dog  and  cat   you would end up with pet dog and pet cat   In my case I wanted to stay with the same number of variables  by just translating the factor variable to a numeric one  in a way that can applied to many variables with many levels  so that cat 1 and dog 0 for instance   Please find the corresponding solution below   crime  lt - data frame city   c  SF    SF    NYC                        year   c 1990  2000  1990                       crime   1 3   indx  lt - sapply crime  is factor   crime indx   lt - lapply crime indx   function x      listOri  lt - unique x    listMod  lt - seq along listOri    res  lt - factor x  levels listOri    res  lt - as numeric res    return res

User · Answer

See the Warning section of  factor      In particular  as numeric applied to   a factor is meaningless  and may   happen by implicit coercion   To   transform a factor f to   approximately its original numeric   values  as numeric levels f   f  is   recommended and slightly more   efficient than   as numeric as character f      The FAQ on R has similar advice     Why is as numeric levels f   f  more efficent than as numeric as character f     as numeric as character f   is effectively as numeric levels f  f    so you are performing the conversion to numeric on length x  values  rather than on nlevels x  values   The speed difference will be most apparent for long vectors with few levels   If the values are mostly unique  there won t be much difference in speed  However you do the conversion  this operation is unlikely to be the bottleneck in your code  so don t worry too much about it     Some timings  library microbenchmark  microbenchmark    as numeric levels f   f     as numeric levels f  f      as numeric as character f      paste0 x     paste x     times   1e5      Unit  microseconds                            expr   min    lq      mean median     uq      max neval        as numeric levels f   f  3 982 5 120  6 088624  5 405  5 974 1981 418 1e 05        as numeric levels f  f   5 973 7 111  8 352032  7 396  8 250 4256 380 1e 05     as numeric as character f   6 827 8 249  9 628264  8 534  9 671 1983 694 1e 05                       paste0 x  7 964 9 387 11 026351  9 956 10 810 2911 257 1e 05                        paste x  7 965 9 387 11 127308  9 956 11 093 2419 458 1e 05

User · Answer

type convert f  on a factor whose levels are completely numeric is another base option   Performance-wise it s about equivalent to as numeric as character f   but not nearly as quick as as numeric levels f   f    identical type convert f   as numeric levels f   f     1  TRUE   That said  if the reason the vector was created as a factor in the first instance has not been addressed  i e  it likely contained some characters that could not be coerced to numeric  then this approach won t work and it will return a factor   levels f  1   lt -  some character level  identical type convert f   as numeric levels f   f     1  FALSE

User · Answer

It is possible only in the case when the factor labels match the original values  I will explain it with an example   Assume the data is vector x   x  lt - c 20  10  30  20  10  40  10  40    Now I will create a factor with four labels   f  lt - factor x  levels   c 10  20  30  40   labels   c  A    B    C    D      1  x is with type double  f is with type integer  This is the first unavoidable loss of information  Factors are always stored as integers    gt  typeof x   1   double   gt  typeof f   1   integer    2  It is not possible to revert back to the original values  10  20  30  40  having only f available  We can see that f holds only integer values 1  2  3  4 and two attributes - the list of labels   A    B    C    D   and the class attribute  factor   Nothing more    gt  str f   Factor w  4 levels  A   B   C   D   2 1 3 2 1 4 1 4  gt  attributes f   levels  1   A   B   C   D    class  1   factor    To revert back to the original values we have to know the values of levels used in creating the factor  In this case c 10  20  30  40   If we know the original levels  in correct order   we can revert back to the original values    gt  orig levels  lt - c 10  20  30  40   gt  x1  lt - orig levels f   gt  all equal x  x1   1  TRUE   And this will work only in case when labels have been defined for all possible values in the original data   So if you will need the original values  you have to keep them  Otherwise there is a high chance it will not be possible to get back to them only from a factor

[r] How to convert a factor to integer\numeric without loss of information?

Examples related to r

Examples related to casting

Examples related to r-faq