[r] How do I convert certain columns of a data frame to become factors?

Possible Duplicate:
identifying or coding unique factors using R

I'm having some trouble with R.

I have a data set similar to the following, but much longer.

A B Pulse
1 2 23
2 2 24
2 2 12
2 3 25
1 1 65
1 3 45

Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times.

As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors.

Help?

This question is related to r numeric r-factor

The answer is


Given the following sample

myData <- data.frame(A=rep(1:2, 3), B=rep(1:3, 2), Pulse=20:25)  

then

myData$A <-as.factor(myData$A)
myData$B <-as.factor(myData$B)

or you could select your columns altogether and wrap it up nicely:

# select columns
cols <- c("A", "B")
myData[,cols] <- data.frame(apply(myData[cols], 2, as.factor))

levels(myData$A) <- c("long", "short")
levels(myData$B) <- c("1kg", "2kg", "3kg")

To obtain

> myData
      A   B Pulse
1  long 1kg    20
2 short 2kg    21
3  long 3kg    22
4 short 1kg    23
5  long 2kg    24
6 short 3kg    25

Here's an example:

#Create a data frame
> d<- data.frame(a=1:3, b=2:4)
> d
  a b
1 1 2
2 2 3
3 3 4

#currently, there are no levels in the `a` column, since it's numeric as you point out.
> levels(d$a)
NULL

#Convert that column to a factor
> d$a <- factor(d$a)
> d
  a b
1 1 2
2 2 3
3 3 4

#Now it has levels.
> levels(d$a)
[1] "1" "2" "3"

You can also handle this when reading in your data. See the colClasses and stringsAsFactors parameters in e.g. readCSV().

Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.

Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to numeric

How to convert entire dataframe to numeric while preserving decimals? What's the difference between integer class and numeric class in R IsNumeric function in c# How to compare numbers in bash? Right way to convert data.frame to a numeric matrix, when df also contains strings? angularjs: allows only numbers to be typed into a text box How to convert Varchar to Double in sql? SQL Server : error converting data type varchar to numeric How do I convert certain columns of a data frame to become factors? How to create a numeric vector of zero length in R

Examples related to r-factor

Coerce multiple columns to factors at once Plotting with ggplot2: "Error: Discrete value supplied to continuous scale" on categorical y-axis R error "sum not meaningful for factors" How do I convert certain columns of a data frame to become factors? Colouring plot by factor in R Converting a factor to numeric without losing information R (as.numeric() doesn't seem to work) Imported a csv-dataset to R but the values becomes factors Drop unused factor levels in a subsetted data frame