How to Correctly Use Lists in R

Question

Brief background  Many  most   contemporary programming languages in widespread use have at least a handful of ADTs  abstract data types  in common  in particular    string  a sequence comprised of characters  list  an ordered collection of values   and map-based type  an unordered array that maps keys to values    In the R programming language  the first two are implemented as character and vector  respectively   When I began learning R  two things were obvious almost from the start  list is the most important data type in R  because it is the parent class for the R data frame   and second  I just couldn t understand how they worked  at least not well enough to use them correctly in my code   For one thing  it seemed to me that R s list data type was a straightforward implementation of the map ADT  dictionary in Python  NSMutableDictionary in Objective C  hash in Perl and Ruby  object literal in Javascript  and so forth    For instance  you create them just like you would a Python dictionary  by passing key-value pairs to a constructor  which in Python is dict not list    x   list  ev1  10   ev2  15   rv   Group 1     And you access the items of an R List just like you would those of a Python dictionary  e g   x  ev1    Likewise  you can retrieve just the  keys  or just the  values  by    names x       fetch just the  keys  of an R list    1   ev1   ev2   rv   unlist x      fetch just the  values  of an R list     ev1       ev2        rv      10        15   Group 1    x   list  a  6   b  9   c  3     sum unlist x      1  18   but R lists are also unlike other map-type ADTs  from among the languages I ve learned anyway   My guess is that this is a consequence of the initial spec for S  i e   an intention to design a data statistics DSL  domain-specific language  from the ground-up    three significant differences between R lists and mapping types in other languages in widespread use  e g   Python  Perl  JavaScript    first  lists in R are an ordered collection  just like vectors  even though the values are keyed  ie  the keys can be any hashable value not just sequential integers   Nearly always  the mapping data type in other languages is unordered   second  lists can be returned from functions even though you never passed in a list when you called the function  and even though the function that returned the list doesn t contain an  explicit  list constructor  Of course  you can deal with this in practice by wrapping the returned result in a call to unlist    x   strsplit LETTERS 1 10             passing in an object of type  character   class x                               returns  list   not a vector of length 2    1  list   A third peculiar feature of R s lists  it doesn t seem that they can be members of another ADT  and if you try to do that then the primary container is coerced to a list  E g    x   c 0 5  0 8  0 23  list 0 5  0 2  0 9   recursive TRUE   class x     1  list   my intention here is not to criticize the language or how it is documented  likewise  I m not suggesting there is anything wrong with the list data structure or how it behaves  All I m after is to correct is my understanding of how they work so I can correctly use them in my code    Here are the sorts of things I d like to better understand    What are the rules which determine when a function call will return a list  e g   strsplit expression recited above   If I don t explicitly assign names to a list  e g   list 10 20 30 40   are the default names just sequential integers beginning with 1    I assume  but I am far from certain that the answer is yes  otherwise we wouldn t be able to coerce this type of list to a vector w  a call to unlist   Why do these two different operators      and       return the same result   x   list 1  2  3  4   both expressions return  1    x 1   x  1   why do these two expressions not return the same result   x   list 1  2  3  4   x2   list 1 4    Please don t point me to the R Documentation   list  R-intro --I have read it carefully and it does not help me answer the type of questions I recited just above    lastly  I recently learned of and began using an R Package  available on CRAN  called hash which implements conventional map-type behavior via an S4 class  I can certainly recommend this Package

User · Answer

why do these two different operators       and        return the same result   x   list 1  2  3  4         provides sub setting operation  In general sub set of any object will have the same type as the original object  Therefore  x 1  provides a list  Similarly x 1 2  is a subset of original list  therefore it is a list  Ex   x 1 2     1    1  1    2    1  2        is for extracting an element from the list  x  1   is valid and extract the first element from the list  x  1 2   is not valid as       does not provide sub setting like         x  2    1  2    gt  x  2 3   Error in x  2 3     subscript out of bounds

User · Answer

One reason lists work as they do  ordered  is to address the need for an ordered container that can contain any type at any node  which vectors do not do   Lists are re-used for a variety of purposes in R  including forming the base of a data frame  which is a list of vectors of arbitrary type  but the same length    Why do these two expressions not return the same result   x   list 1  2  3  4   x2   list 1 4    To add to  Shane s answer  if you wanted to get the same result  try   x3   as list 1 4    Which coerces the vector 1 4 into a list

User · Answer

Just to address the last part of your question  since that really points out the difference between a list and vector in R   Why do these two expressions not return the same result  x   list 1  2  3  4   x2   list 1 4   A list can contain any other class as each element   So you can have a list where the first element is a character vector  the second is a data frame  etc   In this case  you have created two different lists   x has four vectors  each of length 1   x2 has 1 vector of length 4   gt  length x  1     1  1  gt  length x2  1     1  4  So these are completely different lists  R lists are very much like a hash map data structure in that each index value can be associated with any object   Here s a simple example of a list that contains 3 different classes  including a function    gt  complicated list  lt - list  quot a quot  1 4   quot b quot  1 3   quot c quot  matrix 1 4  nrow 2    quot d quot  search   gt  lapply complicated list  class   a  1   quot integer quot   b  1   quot integer quot   c  1   quot matrix quot   d  1   quot function quot   Given that the last element is the search function  I can call it like so   gt  complicated list   quot d quot       1   quot  GlobalEnv quot       As a final comment on this  it should be noted that a data frame is really a list  from the data frame documentation    A data frame is a list of variables of the same number of rows with unique row names  given class     quot data frame quot      That s why columns in a data frame can have different data types  while columns in a matrix cannot   As an example  here I try to create a matrix with numbers and characters   gt  a  lt - 1 4  gt  class a   1   quot integer quot   gt  b  lt - c  quot a quot   quot b quot   quot c quot   quot d quot    gt  d  lt - cbind a  b   gt  d  a   b    1    quot 1 quot   quot a quot   2    quot 2 quot   quot b quot   3    quot 3 quot   quot c quot   4    quot 4 quot   quot d quot   gt  class d  1    1   quot character quot   Note how I cannot change the data type in the first column to numeric because the second column has characters   gt  d  1   lt - as numeric d  1    gt  class d  1    1   quot character quot

User · Answer

Just to take a subset of your questions   This article on indexing addresses the question of the difference between    and        In short      selects a single item from a list and    returns a list of the selected items  In your example  x   list 1  2  3  4   item 1 is a single integer but x  1   returns a single 1 and x 1  returns a list with only one value     gt  x   list 1  2  3  4   gt  x 1    1    1  1   gt  x  1    1  1

User · Answer

Although this is a pretty old question I must say it is touching exactly the knowledge I was missing during my first steps in R - i e  how to express data in my hand as an object in R or how to select from existing objects  It is not easy for an R novice to think  quot in an R box quot  from the very beginning  So I myself started to use crutches below which helped me a lot to find out what object to use for what data  and basically to imagine real-world usage  Though I not giving exact answers to the question the short text below might help the reader who just started with R and is asking similar questions   Atomic vector     I called that  quot sequence quot  for myself  no direction  just sequence of same types    subsets  Vector     the sequence with one direction from 2D    subsets  Matrix     bunch of vectors with the same length forming rows or columns    subsets by rows and columns  or by sequence  Arrays     layered matrices forming 3D Dataframe     a 2D table like in excel  where I can sort  add or remove rows or columns or make arit  operations with them  only after some time I truly recognized that data frame is a clever implementation of list where I can subset using   by rows and columns  but even using     List     to help myself I thought about the list as of tree structure where  i  selects and returns whole branches and   i   returns item from the branch  And because it is tree like structure  you can even use an index sequence to address every single leaf on a very complex list using its   index vector    Lists can be simple or very complex and can mix together various types of objects into one   So for lists you can end up with more ways how to select a leaf depending on situation like in the following example  l  lt - list  quot aaa quot  5 list 1 3  LETTERS 1 4  matrix 1 9 3 3   l  c 5 4      selects 4 from matrix using   index vector   in list l  5   4    selects 4 from matrix using sequential index in matrix l  5   1 2    selects 4 from matrix using row and column in matrix  This way of thinking helped me a lot

User · Answer

x   list 1  2  3  4  x2   list 1 4  all equal x x2    is not the same because 1 4 is the same as c 1 2 3 4   If you want them to be the same then   x   list c 1 2 3 4   x2   list 1 4  all equal x x2

User · Answer

Just to add one more point to this    R does have a data structure equivalent to the Python dict in the hash package   You can read about it in this blog post from the Open Data Group   Here s a simple example    gt  library hash   gt  h  lt - hash  keys c  foo   bar   baz    values 1 3    gt  h c  foo   bar     lt hash gt  containing 2 key-value pairs    bar   2   foo   1   In terms of usability  the hash class is very similar to a list   But the performance is better for large datasets

User · Answer

This is a very old question  but I think that a new answer might add some value since  in my opinion  no one directly addressed some of the concerns in the OP  Despite what the accepted answer suggests  list objects in R are not hash maps  If you want to make a parallel with python  list are more like  you guess  python lists  or tuples actually   It s better to describe how most R objects are stored internally  the C type of an R object is SEXP   They are made basically of three parts   an header  which declares the R type of the object  the length and some other meta data  the data part  which is a standard C heap-allocated array  contiguous block of memory   the attributes  which are a named linked list of pointers to other R objects  or NULL if the object doesn t have attributes    From an internal point of view  there is little difference between a list and a numeric vector for instance  The values they store are just different  Let s break two objects into the paradigm we described before  x  lt - runif 10  y  lt - list runif 10   runif 3    For x   The header will say that the type is numeric  REALSXP in the C-side   the length is 10 and other stuff  The data part will be an array containing 10 double values  The attributes are NULL  since the object doesn t have any   For y   The header will say that the type is list  VECSXP in the C-side   the length is 2 and other stuff  The data part will be an array containing 2 pointers to two SEXP types  pointing to the value obtained by runif 10  and runif 3  respectively  The attributes are NULL  as for x   So the only difference between a numeric vector and a list is that the numeric data part is made of double values  while for the list the data part is an array of pointers to other R objects  What happens with names  Well  names are just some of the attributes you can assign to an object  Let s see the object below  z  lt - list a 1 3  b LETTERS    The header will say that the type is list  VECSXP in the C-side   the length is 2 and other stuff  The data part will be an array containing 2 pointers to two SEXP types  pointing to the value obtained by 1 3 and LETTERS respectively  The attributes are now present and are a names component which is a character R object with value c  quot a quot   quot b quot     From the R level  you can retrieve the attributes of an object with the attributes function  The key-value typical of an hash map in R is just an illusion  When you say  z   quot a quot     this is what happens   the    subset function is called  the argument of the function   quot a quot   is of type character  so the method is instructed to search such value from the names attribute  if present  of the object z  if the names attribute isn t there  NULL is returned  if present  the  quot a quot  value is searched in it  If  quot a quot  is not a name of the object  NULL is returned  if present  the position of the first occurence is determined  1 in the example   So the first element of the list is returned  i e  the equivalent of z  1     The key-value search is rather indirect and is always positional  Also  useful to keep in mind   in hash maps the only limit a key must have is that it must be hashable  names in R must be strings  character vectors    in hash maps you cannot have two identical keys  In R  you can assign names to an object with repeated values  For instance    names y   lt - c  quot same quot    quot same quot      is perfectly valid in R  When you try y   quot same quot    the first value is retrieved  You should know why at this point  In conclusion  the ability to give arbitrary attributes to an object gives you the appearance of something different from an external point of view  But R lists are not hash maps in any way

User · Answer

You say      For another  lists can be returned   from functions even though you never   passed in a List when you called the   function  and even though the function   doesn t contain a List constructor    e g     x   strsplit LETTERS 1 10         passing in an object of type  character  class x      gt   list    And I guess you suggest that this is a problem     I m here to tell you why it s not a problem  -   Your example is a bit simple  in that when you do the string-split  you have a list with elements that are 1 element long  so you know that x  1   is the same as unlist x  1   But what if the result of strsplit returned results of different length in each bin  Simply returning a vector  vs  a list  won t do at all   For instance   stuff  lt - c  You  me  and dupree     You me  and dupree               He ran away  but not very far  and not very fast   x  lt - strsplit stuff       xx  lt - unlist strsplit stuff          In the first case  x   which returns a list   you can tell what the 2nd  part  of the 3rd string was  eg  x  3   2   How could you do the same using xx now that the results have been  unraveled   unlist-ed

User · Answer

Regarding your questions  let me address them in order and give some examples   1  A list is returned if and when the return statement adds one  Consider    R gt  retList  lt - function   return list 1 2 3 4    class retList      1   list   R gt  notList  lt - function   return c 1 2 3 4    class notList      1   numeric   R gt     2  Names are simply not set   R gt  retList  lt - function   return list 1 2 3 4    names retList    NULL R gt     3  They do not return the same thing  Your example gives  R gt  x  lt - list 1 2 3 4  R gt  x 1    1    1  1 R gt  x  1    1  1   where x 1  returns the first element of x -- which is the same as x  Every scalar is a vector of length one  On the other hand x  1   returns the first element of the list   4  Lastly  the two are different between they create  respectively  a list containing four scalars and a list with a single element  that happens to be a vector of four elements

User · Answer

If it helps  I tend to conceive  lists  in R as  records  in other pre-OO languages    they do not make any assumptions about an overarching type  or rather the type of all possible records of any arity and field names is available   their fields can be anonymous  then you access them by strict definition order     The name  record  would clash with the standard meaning of  records   aka rows  in database parlance  and may be this is why their name suggested itself  as lists  of fields

User · Answer

Regarding vectors and the hash array concept from other languages    Vectors are the atoms of R  Eg  rpois 1e4 5   5 random numbers   numeric 55   length-55 zero vector over doubles   and character 12   12 empty strings   are all  basic   Either lists or vectors can have names    gt  n   numeric 10   gt  n   1  0 0 0 0 0 0 0 0 0 0  gt  names n  NULL  gt  names n    LETTERS 1 10   gt  n A B C D E F G H I J  0 0 0 0 0 0 0 0 0 0  Vectors require everything to be the same data type  Watch this    gt  i   integer 5   gt  v   c n i   gt  v A B C D E F G H I J            0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   gt  class v   1   numeric   gt  i   complex 5   gt  v   c n i   gt  class v   1   complex   gt  v    A    B    C    D    E    F    G    H    I    J                           0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i 0 0i  Lists can contain varying data types  as seen in other answers and the OP s question itself    I ve seen languages  ruby  javascript  in which  arrays  may contain variable datatypes  but for example in C    arrays  must be all the same datatype  I believe this is a speed efficiency thing  if you have a numeric 1e6  you know its size and the location of every element a priori  if the thing might contain  Flying Purple People Eaters  in some unknown slice  then you have to actually parse stuff to know basic facts about it   Certain standard R operations also make more sense when the type is guaranteed  For example cumsum 1 9  makes sense whereas cumsum list 1 2 3 4 5  a  6 7 8 9   does not  without the type being guaranteed to be double     As to your second question      Lists can be returned from functions even though you never passed in a List when you called the function   Functions return different data types than they re input all the time  plot returns a plot even though it doesn t take a plot as an input  Arg returns a numeric even though it accepted a complex  Etc    And as for strsplit  the source code is here

[r] How to Correctly Use Lists in R?

Examples related to r

Examples related to list

Examples related to data-structures

Examples related to language-features

Examples related to abstract-data-type