Create an empty data frame

Question

I m trying to initialize a data frame without any rows  Basically  I want to specify the data types for each column and name them  but not have any rows created as a result   The best I ve been able to do so far is something like   df  lt - data frame Date as Date  01 01 2000   format   m  d  Y                      File     User     stringsAsFactors FALSE  df  lt - df -1     Which creates a data frame with a single row containing all of the data types and column names I wanted  but also creates a useless row which then needs to be removed   Is there a better way to do this

User · Answer

If you want to create an empty data frame with dynamic names  colnames in a variable   this can help   names  lt - c  v   u   w   df  lt - data frame   for  k in names  df  k   lt -as numeric     You can change the types as well if you need so  like   names  lt - c  u    v   df  lt - data frame   df  names 1     lt - as numeric   df  names 2     lt - as character

User · Answer

You could use read table with an empty string for the input text as follows   colClasses   c  Date    character    character   col names   c  Date    File    User    df  lt - read table text                        colClasses   colClasses                   col names   col names    Alternatively specifying the col names as a string   df  lt - read csv text  Date File User   colClasses   colClasses    Thanks to Richard Scriven for the improvement

User · Answer

I keep this function handy for whenever I need it  and change the column names and classes to suit the use case  make df  lt - function     data frame name character                         profile character                         sector character                         type character                         year range character                         link character                         stringsAsFactors   F     make df    1  name       profile    sector     type       year range link        lt 0 rows gt   or 0-length row names

User · Answer

To create an empty data frame  pass in the number of rows and columns needed into the following function    create empty table  lt - function num rows  num cols        frame  lt - data frame matrix NA  nrow   num rows  ncol   num cols       return frame      To create an empty frame while specifying the class of each column  simply pass a vector of the desired data types into the following function   create empty table  lt - function num rows  num cols  type vec      frame  lt - data frame matrix NA  nrow   num rows  ncol   num cols     for i in 1 ncol frame         print type vec i       if type vec i      numeric    frame  i   lt - as numeric frame  i        if type vec i      character    frame  i   lt - as character frame  i        if type vec i      logical    frame  i   lt - as logical frame  i        if type vec i      factor    frame  i   lt - as factor frame  i          return frame      Use as follows    df  lt - create empty table 3  3  c  character   logical   numeric      Which gives           X1  X2 X3 1  lt NA gt  NA NA 2  lt NA gt  NA NA 3  lt NA gt  NA NA   To confirm your choices  run the following   lapply df  class    output  X1  1   character    X2  1   logical    X3  1   numeric

User · Answer

If you already have an existent data frame  let s say df that has the columns you want  then you can just create an empty data frame by removing all the rows   empty df   df FALSE     Notice that df still contains the data  but empty df doesn t   I found this question looking for how to create a new instance with empty rows  so I think it might be helpful for some people

User · Answer

Just declare   table   data frame     when you try to rbind the first line it will create the columns

User · Answer

If you want to declare such a data frame with many columns  it ll probably be a pain to type all the column classes out by hand  Especially if you can make use of rep  this approach is easy and fast  about 15  faster than the other solution that can be generalized like this    If your desired column classes are in a vector colClasses  you can do the following   library data table  setnames setDF lapply colClasses  function x  eval call x      col names    lapply will result in a list of desired length  each element of which is simply an empty typed vector like numeric   or integer     setDF converts this list by reference to a data frame   setnames adds the desired names by reference   Speed comparison   classes  lt - c  character    numeric    factor                 integer    logical   raw    complex    NN  lt - 300 colClasses  lt - sample classes  NN  replace   TRUE  col names  lt - paste0  V   1 NN   setDF lapply colClasses  function x  eval call x      library microbenchmark  microbenchmark times   1000                 read   read table text       colClasses   colClasses                                   col names   col names                  DT   setnames setDF lapply colClasses  function x                   eval call x      col names     Unit  milliseconds    expr      min       lq     mean   median       uq      max neval cld    read 2 598226 2 707445 3 247340 2 747835 2 800134 22 46545  1000   b      DT 2 257448 2 357754 2 895453 2 401408 2 453778 17 20883  1000  a    It s also faster than using structure in a similar way   microbenchmark times   1000                 DT   setnames setDF lapply colClasses  function x                   eval call x      col names                  struct   eval parse text paste0                    structure list                      paste paste0 col names                                      colClasses         collapse                              class     data frame          Unit  milliseconds     expr      min       lq     mean   median       uq       max neval cld       DT 2 068121 2 167180 2 821868 2 211214 2 268569 143 70901  1000  a    struct 2 613944 2 723053 3 177748 2 767746 2 831422  21 44862  1000   b

User · Answer

This question didn t specifically address my concerns  outlined here  but in case anyone wants to do this with a parameterized number of columns and no coercion    gt  require dplyr   gt  dbNames  lt - c  a   b   c   d    gt  emptyTableOut  lt -      data frame          character             matrix integer    ncol   3  nrow   0   stringsAsFactors   FALSE         gt        setNames nm   c dbNames    gt  glimpse emptyTableOut  Observations  0 Variables  4   a  lt chr gt     b  lt int gt     c  lt int gt     d  lt int gt    As divibisan states on the linked question          the reason  coercion  occurs  when cbinding matrices and their constituent types  is that a matrix can only have a   single data type  When you cbind 2 matrices  the result is still a   matrix and so the variables are all coerced into a single type before   converting to a data frame

User · Answer

If you don t mind not specifying data types explicitly  you can do it this way   headers lt -c  Date   File   User   df  lt - as data frame matrix  ncol 3 nrow 0   names df  lt -headers   then bind incoming data frame with col types to set data types df lt -rbind df  new df

User · Answer

If you already have a dataframe  you can extract the metadata  column names and types  from a dataframe  e g  if you are controlling a BUG which is only triggered with certain inputs and need a empty dummy Dataframe    colums and types  lt - sapply df  class     prints   c  col1    col2    print dput as character names colums and types        prints   c  integer    factor    dput as character as vector colums and types      And then use the read table to create the empty dataframe  read table text          colClasses   c  integer    factor       col names   c  col1    col2

User · Answer

I created empty data frame using following code   df   data frame id   numeric 0   jobs   numeric 0      and tried to bind some rows to populate the same as follows   newrow   c 3  4  df  lt - rbind df  newrow    but it started giving incorrect column names as follows    X3 X4 1  3  4   Solution to this is to convert newrow to type df as follows   newrow   data frame id 3  jobs 4  df  lt - rbind df  newrow    now gives correct data frame when displayed with column names as follows    id nobs 1  3   4

User · Answer

Say your column names are dynamic  you can create an empty row-named matrix and transform it to a data frame   nms  lt - sample LETTERS sample 1 10   as data frame t matrix nrow length nms  ncol 0 dimnames list nms

User · Answer

You can do it without specifying column types  df   data frame matrix vector    0  3                  dimnames list c    c  Date    File    User                      stringsAsFactors F

User · Answer

Just initialize it with empty vectors   df  lt - data frame Date as Date character                      File character                      User character                      stringsAsFactors FALSE       Here s an other example with different column types    df  lt - data frame Doubles double                     Ints integer                     Factors factor                     Logicals logical                     Characters character                     stringsAsFactors FALSE   str df   gt  str df   data frame     0 obs  of  5 variables     Doubles     num     Ints        int     Factors     Factor w  0 levels      Logicals    logi     Characters  chr    N B     Initializing a data frame with an empty column of the wrong type does not prevent further additions of rows having columns of different types  This method is just a bit safer in the sense that you ll have the correct column types from the beginning  hence if your code relies on some column type checking  it will work even with a data frame with zero rows

User · Answer

By Using data table we can specify data types for each column      library data table      data data table a numeric    b numeric    c numeric

User · Answer

If you are looking for shortness    read csv text  col1 col2     so you don t need to specify the column names separately  You get the default column type logical until you fill the data frame

User · Answer

The most efficient way to do this is to use structure to create a list that has the class  data frame    structure list Date   as Date character     File   character    User   character                class    data frame      1  Date File User    lt 0 rows gt   or 0-length row names    To put this into perspective compared to the presently accepted answer  here s a simple benchmark   s  lt - function   structure list Date   as Date character                                     File   character                                    User   character                                class    data frame   d  lt - function   data frame Date   as Date character                                File   character                                User   character                                stringsAsFactors   FALSE   library  microbenchmark   microbenchmark s    d      Unit  microseconds    expr     min       lq     mean   median      uq      max neval     s    58 503  66 5860  90 7682  82 1735 101 803  469 560   100     d   370 644 382 5755 523 3397 420 1025 604 654 1565 711   100

[r] Create an empty data.frame

Examples related to r

Examples related to dataframe

Examples related to r-faq