Data structure for maintaining tabular data in memory

Question

My scenario is as follows  I have a table of data  handful of fields  less than a hundred rows  that I use extensively in my program  I also need this data to be persistent  so I save it as a CSV and load it on start-up  I choose not to use a database because every option  even SQLite  is an overkill for my humble requirement  also - I would like to be able to edit the values offline in a simple way  and nothing is simpler than notepad    Assume my data looks as follows  in the file it s comma separated without titles  this is just an illustration     Row    Name       Year     Priority ------------------------------------  1      Cat        1998     1  2      Fish       1998     2  3      Dog        1999     1   4      Aardvark   2000     1  5      Wallaby    2000     1  6      Zebra      2001     3   Notes    Row may be a  real  value written to the file or just an auto-generated value that represents the row number  Either way it exists in memory  Names are unique    Things I do with the data    Look-up a row based on either ID  iteration  or name  direct access   Display the table in different orders based on multiple field  I need to sort it e g  by Priority and then Year  or Year and then Priority  etc  I need to count instances based on sets of parameters  e g  how many rows have their year between 1997 and 2002  or how many rows are in 1998 and priority   2  etc    I know this  cries  for SQL     I m trying to figure out what s the best choice for data structure  Following are several choices I see   List of row lists   a      a append   1   Cat   1998  1    a append   2   Fish   1998  2    a append   3   Dog   1999  1          List of column lists  there will obviously be an API for add row etc    a      a append   1  2  3  4  5  6    a append    Cat    Fish    Dog    Aardvark    Wallaby    Zebra     a append   1998  1998  1999  2000  2000  2001    a append   1  2  1  1  1  3      Dictionary of columns lists  constants can be created to replace the string keys    a      a  ID      1  2  3  4  5  6  a  Name       Cat    Fish    Dog    Aardvark    Wallaby    Zebra    a  Year      1998  1998  1999  2000  2000  2001   a  Priority      1  2  1  1  1  3     Dictionary with keys being tuples of  Row  Field    Create constants to avoid string searching NAME 1 YEAR 2 PRIORITY 3  a    a  1  NAME      Cat  a  1  YEAR     1998 a  1  PRIORITY     1 a  2  NAME      Fish  a  2  YEAR     1998 a  2  PRIORITY     2       And I m sure there are other ways    However each way has disadvantages when it comes to my requirements  complex ordering and counting     What s the recommended approach   EDIT   To clarify  performance is not a major issue for me  Because the table is so small  I believe almost every operation will be in the range of milliseconds  which is not a concern for my application

User · Accepted Answer

Having a  table  in memory that needs lookups  sorting  and arbitrary aggregation really does call out for SQL   You said you tried SQLite  but did you realize that SQLite can use an in-memory-only database   connection   sqlite3 connect   memory      Then you can create drop query update tables in memory with all the functionality of SQLite and no files left over when you re done   And as of Python 2 5  sqlite3 is in the standard library  so it s not really  overkill  IMO   Here is a sample of how one might create and populate the database   import csv import sqlite3  db   sqlite3 connect   memory     def init db cur       cur execute    CREATE TABLE foo           Row INTEGER          Name TEXT          Year INTEGER          Priority INTEGER       def populate db cur  csv fp       rdr   csv reader csv fp      cur executemany             INSERT INTO foo  Row  Name  Year  Priority          VALUES               rdr   cur   db cursor   init db cur  populate db cur  open  my csv input file csv    db commit     If you d really prefer not to use SQL  you should probably use a list of dictionaries   lod          list of dicts   def populate lod lod  csv fp       rdr   csv DictReader csv fp    Row    Name    Year    Priority        lod extend rdr   def query lod lod  filter None  sort keys None       if filter is not None          lod    r for r in lod if filter r       if sort keys is not None          lod   sorted lod  key lambda r  r k  for k in sort keys       else          lod   list lod      return lod  def lookup lod lod    kw       for row in lod          for k v in kw iteritems                if row k     str v   break         else              return row     return None   Testing then yields    gt  gt  gt  lod       gt  gt  gt  populate lod lod  csv fp   gt  gt  gt    gt  gt  gt  pprint lookup lod lod  Row 1     Name    Cat    Priority    1    Row    1    Year    1998    gt  gt  gt  pprint lookup lod lod  Name  Aardvark      Name    Aardvark    Priority    1    Row    4    Year    2000    gt  gt  gt  pprint query lod lod  sort keys   Priority    Year        Name    Cat    Priority    1    Row    1    Year    1998       Name    Dog    Priority    1    Row    3    Year    1999       Name    Aardvark    Priority    1    Row    4    Year    2000       Name    Wallaby    Priority    1    Row    5    Year    2000       Name    Fish    Priority    2    Row    2    Year    1998       Name    Zebra    Priority    3    Row    6    Year    2001     gt  gt  gt  pprint query lod lod  sort keys   Year    Priority        Name    Cat    Priority    1    Row    1    Year    1998       Name    Fish    Priority    2    Row    2    Year    1998       Name    Dog    Priority    1    Row    3    Year    1999       Name    Aardvark    Priority    1    Row    4    Year    2000       Name    Wallaby    Priority    1    Row    5    Year    2000       Name    Zebra    Priority    3    Row    6    Year    2001     gt  gt  gt  print len query lod lod  lambda r 1997  lt   int r  Year     lt   2002   6  gt  gt  gt  print len query lod lod  lambda r int r  Year     1998 and int r  Priority     gt  2   0   Personally I like the SQLite version better since it preserves your types better  without extra conversion code in Python  and easily grows to accommodate future requirements   But then again  I m quite comfortable with SQL  so YMMV

User · Answer

I personally wrote a lib for pretty much that quite recently  it is called BD XML  as its most fundamental reason of existence is to serve as a way to send data back and forth between XML files and SQL databases   It is written in Spanish  if that matters in a programming language  but it is very simple   from BD XML import Tabla   It defines an object called Tabla  Table   it can be created with a name for identification an a pre-created connection object of a pep-246 compatible database interface   Table   Tabla  Animals      Then you need to add columns with the agregar columna  add column  method  with can take various key word arguments    campo  field   the name of the field tipo  type   the type of data stored  can be a things like  varchar  and  double  or name of python objects if you aren t interested in exporting to a data base latter  defecto  default   set a default value for the column if there is none when you add a row there are other 3 but are only there for database tings and not actually functional   like   Table agregar columna campo  Name   tipo  str   Table agregar columna campo  Year   tipo  date    declaring it date  time  datetime or timestamp is important for being able to store it as a time object and not only as a number  But you can always put it as a int if you don t care for dates Table agregar columna campo  Priority   tipo  int     Then you add the rows with the    operator  or   if you want to create a copy with an extra row   Table      Cat   date 1998 1 1   1  Table      Year  date 1998 1 1    Priority  2  Name  Fish         The condition for adding is that is a container accessible with either the column name or the position of the column in the table   Then you can generate XML and write it to a file with exportar XML  export XML  and escribir XML  write XML    file   os path abspath os path join os path dirname   file      Animals xml    Table exportar xml   Table escribir xml file    And then import it back with importar XML  import XML  with the file name and indication that you are using a file and not an string literal   Table importar xml file  tipo  archivo    archivo means file   Advanced  This are ways you can use a Tabla object in a SQL manner    UPDATE  lt Table gt  SET Name   CONCAT Name     Priority   Priority   NULL WHERE id   2 for row in Table      if row  id      2          row  Name            row  Priority           row  Priority     None print Table    DELETE FROM  lt Table gt  WHERE MOD id 2    0 LIMIT 1 n   0 nmax   1 for row in Table      if row  id     2    0          del Table row          n    1         if n  gt   nmax  break print Table    this examples assume a column named  id  but can be replaced width row pos for your example   if row pos    2    The file can be download from   https   bitbucket org WolfangT librerias

User · Answer

A very old question I know but     A pandas DataFrame seems to be the ideal option here   http   pandas pydata org pandas-docs version 0 13 1 generated pandas DataFrame html  From the blurb     Two-dimensional size-mutable  potentially heterogeneous tabular data   structure with labeled axes  rows and columns   Arithmetic operations   align on both row and column labels  Can be thought of as a dict-like   container for Series objects  The primary pandas data structure   http   pandas pydata org

User · Answer

I personally would use the list of row lists  Because the data for each row is always in the same order  you can easily sort by any of the columns by simply accessing that element in each of the lists  You can also easily count based on a particular column in each list  and make searches as well  It s basically as close as it gets to a 2-d array   Really the only disadvantage here is that you have to know in what order the data is in  and if you change that ordering  you ll have to change your search sorting routines to match   Another thing you can do is have a list of dictionaries   rows      rows append   ID   1    name   Cat    year   1998    priority   1      This would avoid needing to know the order of the parameters  so you can look through each  year  field in the list

User · Answer

Have a Table class whose rows is a list of dict or better row objects  In table do not directly add rows but have a method which update few lookup maps e g  for name if you are not adding rows in order or id are not consecutive you can have idMap too e g   class Table object       def   init   self           self rows        list of row objects  we assume if order of id         self nameMap        for faster direct lookup for row by name      def addRow self  row           self rows append row          self nameMap row  name      row      def getRow self  name           return self nameMap name    table   Table   table addRow   ID  1  name   a

User · Answer

First  given that you have a complex data retrieval scenario  are you sure even SQLite is overkill   You ll end up having an ad hoc  informally-specified  bug-ridden  slow implementation of half of SQLite  paraphrasing Greenspun s Tenth Rule   That said  you are very right in saying that choosing a single data structure will impact one or more of searching  sorting or counting  so if performance is paramount and your data is constant  you could consider having more than one structure for different purposes    Above all  measure what operations will be more common and decide which structure will end up costing less

[python] Data structure for maintaining tabular data in memory?

Examples related to python

Examples related to data-structures