[r] Convert data.frame column format from character to factor

I would like to change the format (class) of some columns of my data.frame object (mydf) from charactor to factor.

I don't want to do this when I'm reading the text file by read.table() function.

Any help would be appreciated.

This question is related to r dataframe character r-faq

The answer is


Unless you need to identify the columns automatically, I found this to be the simplest solution:

df$name <- as.factor(df$name)

This makes column name in dataframe df a factor.


I've doing it with a function. In this case I will only transform character variables to factor:

for (i in 1:ncol(data)){
    if(is.character(data[,i])){
        data[,i]=factor(data[,i])
    }
}

If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat:

character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)

This creates a vector identifying which columns are of class character, then applies as.factor to those columns.

Sample data:

dat <- data.frame(var1 = c("a", "b"),
                  var2 = c("hi", "low"),
                  var3 = c(0, 0.1),
                  stringsAsFactors = FALSE
                  )

You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:

library(dplyr)

# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)

# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)

Another short way you could use is a pipe (%<>%) from the magrittr package. It converts the character column mycolumn to a factor.

library(magrittr)

mydf$mycolumn %<>% factor

# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
 col_names <- names(df)
# to do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)

Explanation. All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply. The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df

Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:

cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to character

Set the maximum character length of a UITextField in Swift Max length UITextField Remove last character from string. Swift language Get nth character of a string in Swift programming language How many characters can you store with 1 byte? How to convert integers to characters in C? Converting characters to integers in Java How to check the first character in a string in Bash or UNIX shell? Invisible characters - ASCII How to delete Certain Characters in a excel 2010 cell

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc