Unless you need to identify the columns automatically, I found this to be the simplest solution:
df$name <- as.factor(df$name)
This makes column name
in dataframe df
a factor.
I've doing it with a function. In this case I will only transform character variables to factor:
for (i in 1:ncol(data)){
if(is.character(data[,i])){
data[,i]=factor(data[,i])
}
}
If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat
:
character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)
This creates a vector identifying which columns are of class character
, then applies as.factor
to those columns.
Sample data:
dat <- data.frame(var1 = c("a", "b"),
var2 = c("hi", "low"),
var3 = c(0, 0.1),
stringsAsFactors = FALSE
)
You could use dplyr::mutate_if()
to convert all character columns or dplyr::mutate_at()
for select named character columns to factors:
library(dplyr)
# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)
# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)
Another short way you could use is a pipe (%<>%
) from the magrittr package. It converts the character column mycolumn to a factor.
library(magrittr)
mydf$mycolumn %<>% factor
# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
col_names <- names(df)
# to do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)
Explanation. All dataframes are lists and the results of [
used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply
. The above assignment will create a set of lists that the function data.frame.[<-
should successfully stick back into into the dataframe, df
Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:
cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)
Source: Stackoverflow.com