[r] Error in data frame undefined columns selected

I've been working on an assignment where I have to read in some csv files from a directory "specdata". The files are very similar in that there are 332 titled 001.csv - 332.csv. They have consistent columns and headers, if that matters.

I believe I'm close but am tripping up with the above error message

" Error in [.data.frame(data1, good) : undefined columns selected"

I had expected a data frame to load with all the data specified by the subset of files in id parameter.

pollutantmean <- function(directory, pollutant, id = 1:332) {

              files <- list.files(directory)

              subsetFiles <- files[id]

              for (i in subsetFiles) {

                  filepaths <- paste(directory,"/",i, sep='')

                  data1 <- read.csv(filepaths)
                }

              data1

             good <- complete.cases(data1)

             data2 <- data1[good]

             data2
}

# test it out and ignore middle parameter for now
pollutantmean("specdata", "pass", 1:3)

This question is related to r

The answer is


Are you meaning?

data2 <- data1[good,]

With

data1[good]

you're selecting columns in a wrong way (using a logical vector of complete rows).

Consider that parameter pollutant is not used; is it a column name that you want to extract? if so it should be something like

data2 <- data1[good, pollutant]

Furthermore consider that you have to rbind the data.frames inside the for loop, otherwise you get only the last data.frame (its completed.cases)

And last but not least, i'd prefer generating filenames eg with

id <- 1:322
paste0( directory, "/", gsub(" ", "0", sprintf("%3d",id)), ".csv")

A little modified chunk of ?sprintf

The string fmt (in our case "%3d") contains normal characters, which are passed through to the output string, and also conversion specifications which operate on the arguments provided through .... The allowed conversion specifications start with a % and end with one of the letters in the set aAdifeEgGosxX%. These letters denote the following types:

  • d: integer

Eg a more general example

    sprintf("I am %10d years old", 25)
[1] "I am         25 years old"
          ^^^^^^^^^^
          |        |
          1       10