Only read selected columns

Question

Can anyone please tell me how to read only the first 6 months  7 columns  for each year of the data below  for example by using read table     Year   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec    2009   -41  -27  -25  -31  -31  -39  -25  -15  -30  -27  -21  -25 2010   -41  -27  -25  -31  -31  -39  -25  -15  -30  -27  -21  -25  2011   -21  -27   -2   -6  -10  -32  -13  -12  -27  -30  -38  -29

User · Answer

You do it like this   df   read table  file txt   nrows 1  header TRUE  sep   t   stringsAsFactors FALSE  colClasses   as list apply df  2  class   needCols   c  Year    Jan    Feb    Mar    Apr    May    Jun   colClasses  names colClasses   in  needCols    list NULL  df   read table  file txt   header TRUE  colClasses colClasses  sep   t   stringsAsFactors FALSE

User · Answer

Say the data are in file data txt  you can use the colClasses argument of read table   to skip columns  Here the data in the first 7 columns are  integer  and we set the remaining 6 columns to  NULL  indicating they should be skipped   gt  read table  data txt   colClasses   c rep  integer   7   rep  NULL   6                  header   TRUE    Year Jan Feb Mar Apr May Jun 1 2009 -41 -27 -25 -31 -31 -39 2 2010 -41 -27 -25 -31 -31 -39 3 2011 -21 -27  -2  -6 -10 -32   Change  integer  to one of the accepted types as detailed in  read table depending on the real type of data   data txt looks like this     cat data txt   Year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec  2009 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25 2010 -41 -27 -25 -31 -31 -39 -25 -15 -30 -27 -21 -25 2011 -21 -27 -2 -6 -10 -32 -13 -12 -27 -30 -38 -29   and was created by using  write table dat  file    data txt   row names   FALSE    where dat is  dat  lt - structure list Year   2009 2011  Jan   c -41L  -41L  -21L   Feb   c -27L   -27L  -27L   Mar   c -25L  -25L  -2L   Apr   c -31L  -31L  -6L    May   c -31L  -31L  -10L   Jun   c -39L  -39L  -32L   Jul   c -25L   -25L  -13L   Aug   c -15L  -15L  -12L   Sep   c -30L  -30L  -27L    Oct   c -27L  -27L  -30L   Nov   c -21L  -21L  -38L   Dec   c -25L   -25L  -29L     Names   c  Year    Jan    Feb    Mar    Apr     May    Jun    Jul    Aug    Sep    Oct    Nov    Dec    class    data frame   row names   c NA  -3L     If the number of columns is not known beforehand  the utility function count fields will read through the file and count the number of fields in each line      returns a vector equal to the number of lines in the file count fields  data txt   sep     t      returns the maximum to set colClasses max count fields  data txt   sep     t

User · Answer

You could also use JDBC to achieve this  Let s create a sample csv file   write table x mtcars  file  mtcars csv   sep      row names F  col names T    create example csv file   Download and save the the CSV JDBC driver from this link  http   sourceforge net projects csvjdbc files latest download   gt  library RJDBC    gt  path to jdbc driver  lt -  jdbc  csvjdbc-1 0-18 jar   gt  drv  lt - JDBC  org relique jdbc csv CsvDriver   path to jdbc driver   gt  conn  lt - dbConnect drv  sprintf  jdbc relique csv  s   getwd       gt  head dbGetQuery conn   select   from mtcars    3     mpg cyl disp  hp drat    wt  qsec vs am gear carb 1   21   6  160 110  3 9  2 62 16 46  0  1    4    4 2   21   6  160 110  3 9 2 875 17 02  0  1    4    4 3 22 8   4  108  93 3 85  2 32 18 61  1  1    4    1   gt  head dbGetQuery conn   select mpg  gear from mtcars    3     MPG GEAR 1   21    4 2   21    4 3 22 8    4

User · Answer

To read a specific set of columns from a dataset you  there are several other options   1  With freadfrom the data table-package   You can specify the desired columns with the select parameter from fread from the data table package  You can specify the columns with a vector of column names or column numbers   For the example dataset   library data table  dat  lt - fread  data txt   select   c  Year   Jan   Feb   Mar   Apr   May   Jun    dat  lt - fread  data txt   select   c 1 7     Alternatively  you can use the drop parameter to indicate which columns should not be read   dat  lt - fread  data txt   drop   c  Jul   Aug   Sep   Oct   Nov   Dec    dat  lt - fread  data txt   drop   c 8 13     All result in    gt  data   Year Jan Feb Mar Apr May Jun 1 2009 -41 -27 -25 -31 -31 -39 2 2010 -41 -27 -25 -31 -31 -39 3 2011 -21 -27  -2  -6 -10 -32   UPDATE  When you don t want fread to return a data table  use the data table   FALSE-parameter  e g   fread  data txt   select   c 1 7   data table   FALSE    2  With read csv sql from the sqldf-package   Another alternative is the read csv sql function from the sqldf package   library sqldf  dat  lt - read csv sql  data txt                       sql    select Year Jan Feb Mar Apr May Jun from file                       sep     t     3  With the read  -functions from the readr-package   library readr  dat  lt - read table  data txt                     col types   cols only Year    i   Jan    i   Feb    i   Mar    i                                           Apr    i   May    i   Jun    i    dat  lt - read table  data txt                     col types   list Jul   col skip    Aug   col skip    Sep   col skip                                       Oct   col skip    Nov   col skip    Dec   col skip     dat  lt - read table  data txt   col types    iiiiiii           From the documentation an explanation for the used characters with col types      each character represents one column  c   character  i   integer  n   number  d   double  l   logical  D   date  T   date time  t   time      guess  or   - to skip the column

User · Answer

The vroom package provides a  tidy  method of selecting   dropping columns by name during import  Docs  https   www tidyverse org blog 2019 05 vroom-1-0-0  column-selection Column selection  col select  The vroom argument  col select  makes selecting columns to keep  or omit  more straightforward  The interface for col select is the same as dplyr  select    Select columns by name data  lt - vroom  quot flights tsv quot   col select   c year  flight  tailnum     gt  Observations  336 776   gt  Variables  3   gt  chr  1   tailnum   gt  dbl  2   year  flight   gt     gt  Call  spec    for a copy-pastable column specification   gt  Specify the column types with  col types  to quiet this message  Drop columns by name data  lt - vroom  quot flights tsv quot   col select   c -dep time  -air time -time hour     gt  Observations  336 776   gt  Variables  13   gt  chr  4   carrier  tailnum  origin  dest   gt  dbl  9   year  month  day  sched dep time  dep delay  arr time  sched arr time  arr      gt     gt  Call  spec    for a copy-pastable column specification   gt  Specify the column types with  col types  to quiet this message Use the selection helpers data  lt - vroom  quot flights tsv quot   col select   ends with  quot time quot      gt  Observations  336 776   gt  Variables  5   gt  dbl  5   dep time  sched dep time  arr time  sched arr time  air time   gt     gt  Call  spec    for a copy-pastable column specification   gt  Specify the column types with  col types  to quiet this message  Or rename columns by name data  lt - vroom  quot flights tsv quot   col select   list plane   tailnum  everything       gt  Observations  336 776   gt  Variables  19   gt  chr    4   carrier  tailnum  origin  dest   gt  dbl   14   year  month  day  dep time  sched dep time  dep delay  arr time  sched arr      gt  dttm   1   time hour   gt     gt  Call  spec    for a copy-pastable column specification   gt  Specify the column types with  col types  to quiet this message data   gt    A tibble  336 776 x 19   gt     plane  year month   day dep time sched dep time dep delay arr time   gt      lt chr gt   lt dbl gt   lt dbl gt   lt dbl gt      lt dbl gt            lt dbl gt       lt dbl gt      lt dbl gt    gt   1 N142     2013     1     1      517            515         2      830   gt   2 N242     2013     1     1      533            529         4      850   gt   3 N619     2013     1     1      542            540         2      923   gt   4 N804     2013     1     1      544            545        -1     1004   gt   5 N668     2013     1     1      554            600        -6      812   gt   6 N394     2013     1     1      554            558        -4      740   gt   7 N516     2013     1     1      555            600        -5      913   gt   8 N829     2013     1     1      557            600        -3      709   gt   9 N593     2013     1     1      557            600        -3      838   gt  10 N3AL     2013     1     1      558            600        -2      753   gt        with 336 766 more rows  and 11 more variables  sched arr time  lt dbl gt     gt      arr delay  lt dbl gt   carrier  lt chr gt   flight  lt dbl gt   origin  lt chr gt     gt      dest  lt chr gt   air time  lt dbl gt   distance  lt dbl gt   hour  lt dbl gt   minute  lt dbl gt     gt      time hour  lt dttm gt

[r] Only read selected columns

Examples related to r

Examples related to import

Examples related to r-faq