[r] How to convert a data frame column to numeric type?

How do you convert a data frame column to a numeric type?

This question is related to r dataframe type-conversion

The answer is


Universal way using type.convert() and rapply():

convert_types <- function(x) {
    stopifnot(is.list(x))
    x[] <- rapply(x, utils::type.convert, classes = "character",
                  how = "replace", as.is = TRUE)
    return(x)
}
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)
sapply(d, class)
#>        char   fake_char         fac    char_fac         num 
#> "character" "character"    "factor"    "factor"   "integer"
sapply(convert_types(d), class)
#>        char   fake_char         fac    char_fac         num 
#> "character"   "integer"    "factor"    "factor"   "integer"

With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns):

as.data.frame(lapply(X, as.numeric))

and for converting whole matrix into numeric you have two ways: Either:

mode(X) <- "numeric"

or:

X <- apply(X, 2, as.numeric)

Alternatively you can use data.matrix function to convert everything into numeric, although be aware that the factors might not get converted correctly, so it is safer to convert everything to character first:

X <- sapply(X, as.character)
X <- data.matrix(X)

I usually use this last one if I want to convert to matrix and numeric simultaneously


If you run into problems with:

as.numeric(as.character(dat$x))

Take a look to your decimal marks. If they are "," instead of "." (e.g. "5,3") the above won't work.

A potential solution is:

as.numeric(gsub(",", ".", dat$x))

I believe this is quite common in some non English speaking countries.


I would have added a comment (cant low rating)

Just to add on user276042 and pangratz

dat$x = as.numeric(as.character(dat$x))

This will override the values of existing column x


with hablar::convert

To easily convert multiple columns to different data types you can use hablar::convert. Simple syntax: df %>% convert(num(a)) converts the column a from df to numeric.

Detailed example

Lets convert all columns of mtcars to character.

df <- mtcars %>% mutate_all(as.character) %>% as_tibble()

> df
# A tibble: 32 x 11
   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
 2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
 3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    

With hablar::convert:

library(hablar)

# Convert columns to integer, numeric and factor
df %>% 
  convert(int(cyl, vs),
          num(disp:wt),
          fct(gear))

results in:

# A tibble: 32 x 11
   mpg     cyl  disp    hp  drat    wt qsec     vs am    gear  carb 
   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
 1 21        6  160    110  3.9   2.62 16.46     0 1     4     4    
 2 21        6  160    110  3.9   2.88 17.02     0 1     4     4    
 3 22.8      4  108     93  3.85  2.32 18.61     1 1     4     1    
 4 21.4      6  258    110  3.08  3.22 19.44     1 0     3     1   

To convert character to numeric you have to convert it into factor by applying

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

You have to make two columns with the same data, because one column cannot convert into numeric. If you do one conversion it gives the below error

transform(BankData, LoanApp=as.numeric(LoanApproval))
Warning message:
  In eval(substitute(list(...)), `_data`, parent.frame()) :
  NAs introduced by coercion

so, after doing two column of the same data apply

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

it will transform the character to numeric successfully


In my PC (R v.3.2.3), apply or sapply give error. lapply works well.

dt[,2:4] <- lapply(dt[,2:4], function (x) as.factor(as.numeric(x)))

To convert a data frame column to numeric you just have to do:-

factor to numeric:-

data_frame$column <- as.numeric(as.character(data_frame$column))

Tim is correct, and Shane has an omission. Here are additional examples:

R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a), 
                        numchr = as.numeric(as.character(df$a)))
R> df
   a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
  a          num           numchr    
 10:1   Min.   :1.00   Min.   :10.0  
 11:1   1st Qu.:2.25   1st Qu.:11.2  
 12:1   Median :3.50   Median :12.5  
 13:1   Mean   :3.50   Mean   :12.5  
 14:1   3rd Qu.:4.75   3rd Qu.:13.8  
 15:1   Max.   :6.00   Max.   :15.0  
R> 

Our data.frame now has a summary of the factor column (counts) and numeric summaries of the as.numeric() --- which is wrong as it got the numeric factor levels --- and the (correct) summary of the as.numeric(as.character()).


Something that has helped me: if you have ranges of variables to convert (or just more then one), you can use sapply.

A bit nonsensical but just for example:

data(cars)
cars[, 1:2] <- sapply(cars[, 1:2], as.factor)

Say columns 3, 6-15 and 37 of you dataframe need to be converted to numeric one could:

dat[, c(3,6:15,37)] <- sapply(dat[, c(3,6:15,37)], as.numeric)

While your question is strictly on numeric, there are many conversions that are difficult to understand when beginning R. I'll aim to address methods to help. This question is similar to This Question.

Type conversion can be a pain in R because (1) factors can't be converted directly to numeric, they need to be converted to character class first, (2) dates are a special case that you typically need to deal with separately, and (3) looping across data frame columns can be tricky. Fortunately, the "tidyverse" has solved most of the issues.

This solution uses mutate_each() to apply a function to all columns in a data frame. In this case, we want to apply the type.convert() function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the mutate_if() function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.


library(tidyverse) 
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
    mutate_all(type.convert) %>%
    mutate_if(is.factor, as.character) %>%
    mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90

If the dataframe has multiple types of columns, some characters, some numeric try the following to convert just the columns that contain numeric values to numeric:

for (i in 1:length(data[1,])){
  if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
  else {
    data[,i]<-as.numeric(data[,i])
  }
}

If you don't care about preserving the factors, and want to apply it to any column that can get converted to numeric, I used the script below. if df is your original dataframe, you can use the script below.

df[] <- lapply(df, as.character)
df <- data.frame(lapply(df, function(x) ifelse(!is.na(as.numeric(x)), as.numeric(x),  x)))

I referenced Shane's and Joran's solution btw


Considering there might exist char columns, this is based on @Abdou in Get column types of excel sheet automatically answer:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}
df<-makenumcols(df)

Though others have covered the topic pretty well, I'd like to add this additional quick thought/hint. You could use regexp to check in advance whether characters potentially consist only of numerics.

for(i in seq_along(names(df)){
     potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)

For more sophisticated regular expressions and a neat why to learn/experience their power see this really nice website: http://regexr.com/


df ist your dataframe. x is a column of df you want to convert

as.numeric(factor(df$x))

if x is the column name of dataframe dat, and x is of type factor, use:

as.numeric(as.character(dat$x))

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to type-conversion

How can I convert a char to int in Java? pandas dataframe convert column type to string or categorical How to convert an Object {} to an Array [] of key-value pairs in JavaScript convert string to number node.js Ruby: How to convert a string to boolean Convert bytes to int? Convert dataframe column to 1 or 0 for "true"/"false" values and assign to dataframe SQL Server: Error converting data type nvarchar to numeric How do I convert a Python 3 byte-string variable into a regular string? Leading zeros for Int in Swift