[r] converting multiple columns from character to numeric format in r

What is the most efficient way to convert multiple columns in a data frame from character to numeric format?

I have a dataframe called DF with all character variables.

I would like to do something like

for (i in names(DF){
    DF$i <- as.numeric(DF$i)
}

Thank you

This question is related to r function formatting

The answer is


I realize this is an old thread but wanted to post a solution similar to your request for a function (just ran into the similar issue myself trying to format an entire table to percentage labels).

Assume you have a df with 5 character columns you want to convert. First, I create a table containing the names of the columns I want to manipulate:

col_to_convert <- data.frame(nrow = 1:5
                            ,col = c("col1","col2","col3","col4","col5"))

for (i in 1:max(cal_to_convert$row))
  {
    colname <- col_to_convert$col[i]
    colnum <- which(colnames(df) == colname)
        for (j in 1:nrow(df))
          {
           df[j,colnum] <- as.numericdf(df[j,colnum])
          }
  }

This is not ideal for large tables as it goes cell by cell, but it would get the job done.


type.convert()

Convert a data object to logical, integer, numeric, complex, character or factor as appropriate.

Add the as.is argument type.convert(df,as.is = T) to prevent character vectors from becoming factors when there is a non-numeric in the data set.

See.


like this?

DF <- data.frame("a" = as.character(0:5),
             "b" = paste(0:5, ".1", sep = ""),
             "c" = paste(10:15),
             stringsAsFactors = FALSE)

DF <- apply(DF, 2, as.numeric)

If there are "real" characters in dataframe like 'a' 'b' 'c', i would recommend answer from davsjob.


You could use convert from the hablar package:

library(dplyr)
library(hablar)

# Sample df (stolen from the solution by Luca Braglia)
df <- tibble("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6])

# insert variable names in num()
df %>% convert(num(a, b))

Which gives you:

# A tibble: 6 x 3
      a     b c    
  <dbl> <dbl> <chr>
1    0. 0.100 a    
2    1. 1.10  b    
3    2. 2.10  c    
4    3. 3.10  d    
5    4. 4.10  e    
6    5. 5.10  f   

Or if you are lazy, let retype() from hablar guess the right data type:

df %>% retype()

which gives you:

# A tibble: 6 x 3
      a     b c    
  <int> <dbl> <chr>
1     0 0.100 a    
2     1 1.10  b    
3     2 2.10  c    
4     3 3.10  d    
5     4 4.10  e    
6     5 5.10  f   

You can use index of columns: data_set[,1:9] <- sapply(dataset[,1:9],as.character)


I think I figured it out. Here's what I did (perhaps not the most elegant solution - suggestions on how to imp[rove this are very much welcome)

#names of columns in data frame
cols <- names(DF)
# character variables
cols.char <- c("fx_code","date")
#numeric variables
cols.num <- cols[!cols %in% cols.char]

DF.char <- DF[cols.char]
DF.num <- as.data.frame(lapply(DF[cols.num],as.numeric))
DF2 <- cbind(DF.char, DF.num)

for (i in 1:names(DF){
    DF[[i]] <- as.numeric(DF[[i]])
}

I solved this using double brackets [[]]


I used this code to convert all columns to numeric except the first one:

    library(dplyr)
    # check structure, row and column number with: glimpse(df)
    # convert to numeric e.g. from 2nd column to 10th column
    df <- df %>% 
     mutate_at(c(2:10), as.numeric)

If you're already using the tidyverse, there are a few solution depending on the exact situation.

Basic if you know it's all numbers and doesn't have NAs

library(dplyr)

# solution
dataset %>% mutate_if(is.character,as.numeric)

Test cases

df <- data.frame(
  x1 = c('1','2','3'),
  x2 = c('4','5','6'),
  x3 = c('1','a','x'), # vector with alpha characters
  x4 = c('1',NA,'6'), # numeric and NA
  x5 = c('1',NA,'x'), # alpha and NA
  stringsAsFactors = F)

# display starting structure
df %>% str()

Convert all character vectors to numeric (could fail if not numeric)

df %>%
  select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
  mutate_if(is.character,as.numeric) %>%
  str()

Check if each column can be converted. This can be an anonymous function. It returns FALSE if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.

is_all_numeric <- function(x) {
  !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>% 
  mutate_if(is_all_numeric,as.numeric) %>%
  str()

If you want to convert specific named columns, then mutate_at is better.

df %>% mutate_at('x1', as.numeric) %>% str()

Slight adjustment to answers from ARobertson and Kenneth Wilson that worked for me.

Running R 3.6.0, with library(tidyverse) and library(dplyr) in my environment:

library(tidyverse)
library(dplyr)
> df %<>% mutate_if(is.character, as.numeric)
Error in df %<>% mutate_if(is.character, as.numeric) : 
  could not find function "%<>%"

I did some quick research and found this note in Hadley's "The tidyverse style guide".

The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.

# Good x <- x %>%
           abs() %>%    
           sort()

# Bad x %<>%   
          abs() %>%
          sort()

Solution

Based on that style guide:

df_clean <- df %>% mutate_if(is.character, as.numeric)

Working example

> df_clean <- df %>% mutate_if(is.character, as.numeric)
Warning messages:
1: NAs introduced by coercion 
2: NAs introduced by coercion 
3: NAs introduced by coercion 
4: NAs introduced by coercion 
5: NAs introduced by coercion 
6: NAs introduced by coercion 
7: NAs introduced by coercion 
8: NAs introduced by coercion 
9: NAs introduced by coercion 
10: NAs introduced by coercion 
> df_clean
# A tibble: 3,599 x 17
   stack datetime            volume BQT90 DBT90 DRT90 DLT90 FBT90  RT90 HTML90 RFT90 RLPP90 RAT90 SRVR90 SSL90 TCP90 group
   <dbl> <dttm>               <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to function

$http.get(...).success is not a function Function to calculate R2 (R-squared) in R How to Call a Function inside a Render in React/Jsx How does Python return multiple values from a function? Default optional parameter in Swift function How to have multiple conditions for one if statement in python Uncaught TypeError: .indexOf is not a function Proper use of const for defining functions in JavaScript Run php function on button click includes() not working in all browsers

Examples related to formatting

How to add empty spaces into MD markdown readme on GitHub? VBA: Convert Text to Number How to change indentation in Visual Studio Code? How do you change the formatting options in Visual Studio Code? (Excel) Conditional Formatting based on Adjacent Cell Value 80-characters / right margin line in Sublime Text 3 Format certain floating dataframe columns into percentage in pandas Format JavaScript date as yyyy-mm-dd AngularJS format JSON string output converting multiple columns from character to numeric format in r