[string] Converting string to numeric

I've imported a test file and tried to make a histogram

pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")   
hist <- as.numeric(pichman$WS)    

However, I get different numbers from values in my dataset. Originally I thought that this because I had text, so I deleted the text:

table(pichman$WS)    
ws <- pichman$WS[pichman$WS!="Down" & pichman$WS!="NoData"]    

However, I am still getting very high numbers does anyone have an idea?

This question is related to string r

The answer is


As csgillespie said. stringsAsFactors is default on TRUE, which converts any text to a factor. So even after deleting the text, you still have a factor in your dataframe.

Now regarding the conversion, there's a more optimal way to do so. So I put it here as a reference :

> x <- factor(sample(4:8,10,replace=T))
> x
 [1] 6 4 8 6 7 6 8 5 8 4
Levels: 4 5 6 7 8
> as.numeric(levels(x))[x]
 [1] 6 4 8 6 7 6 8 5 8 4

To show it works.

The timings :

> x <- factor(sample(4:8,500000,replace=T))
> system.time(as.numeric(as.character(x)))
   user  system elapsed 
   0.11    0.00    0.11 
> system.time(as.numeric(levels(x))[x])
   user  system elapsed 
      0       0       0 

It's a big improvement, but not always a bottleneck. It gets important however if you have a big dataframe and a lot of columns to convert.