I have a data frame that I construct as such:
> yyz <- data.frame(a = c("1","2","n/a"), b = c(1,2,"n/a"))
> apply(yyz, 2, class)
a b
"character" "character"
I am attempting to convert the last column to numeric while still maintaining the first column as a character. I tried this:
> yyz$b <- as.numeric(as.character(yyz$b))
> yyz
a b
1 1
2 2
n/a NA
But when I run the apply class it is showing me that they are both character classes.
> apply(yyz, 2, class)
a b
"character" "character"
Am I setting up the data frame wrong? Or is it the way R is interpreting the data frame?
This question is related to
r
If we need only one column to be numeric
yyz$b <- as.numeric(as.character(yyz$b))
But, if all the columns needs to changed to numeric
, use lapply
to loop over the columns and convert to numeric
by first converting it to character
class as the columns were factor
.
yyz[] <- lapply(yyz, function(x) as.numeric(as.character(x)))
Both the columns in the OP's post are factor
because of the string "n/a"
. This could be easily avoided while reading the file using na.strings = "n/a"
in the read.table/read.csv
or if we are using data.frame
, we can have character
columns with stringsAsFactors=FALSE
(the default is stringsAsFactors=TRUE
)
Regarding the usage of apply
, it converts the dataset to matrix
and matrix
can hold only a single class. To check the class
, we need
lapply(yyz, class)
Or
sapply(yyz, class)
Or check
str(yyz)
Source: Stackoverflow.com