[r] Specifying colClasses in the read.csv

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column "time" is basically a character vector while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" , 
                 colClasses=c(time="character", "numeric"), 
                 strip.white=FALSE)

In the above command, I would want R to read in the "time" column as "character" and the rest as numeric. Although, the "data" variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I could fix these warnings?

Warning messages:
 1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
    not all columns named in 'colClasses' exist
 2: In tmp[i[i > 0L]] <- colClasses :
    number of items to replace is not a multiple of replacement length

Derek

This question is related to r csv read.csv

The answer is


The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))

For multiple datetime columns with no header, and a lot of columns, say my datetime fields are in columns 36 and 38, and I want them read in as character fields:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))                        

I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

This should set the default type for all columns as character, while time would be parsed as integer.


You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))

Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)

If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)

If we combine what @Hendy and @Oddysseus Ithaca contributed, we get cleaner and a more general (i.e., adaptable?) chunk of code.

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))