[sql] How to store a list in a column of a database table

So, per Mehrdad's answer to a related question, I get it that a "proper" database table column doesn't store a list. Rather, you should create another table that effectively holds the elements of said list and then link to it directly or through a junction table. However, the type of list I want to create will be composed of unique items (unlike the linked question's fruit example). Furthermore, the items in my list are explicitly sorted - which means that if I stored the elements in another table, I'd have to sort them every time I accessed them. Finally, the list is basically atomic in that any time I wish to access the list, I will want to access the entire list rather than just a piece of it - so it seems silly to have to issue a database query to gather together pieces of the list.

AKX's solution (linked above) is to serialize the list and store it in a binary column. But this also seems inconvenient because it means that I have to worry about serialization and deserialization.

Is there any better solution? If there is no better solution, then why? It seems that this problem should come up from time to time.

... just a little more info to let you know where I'm coming from. As soon as I had just begun understanding SQL and databases in general, I was turned on to LINQ to SQL, and so now I'm a little spoiled because I expect to deal with my programming object model without having to think about how the objects are queried or stored in the database.

Thanks All!

John

UPDATE: So in the first flurry of answers I'm getting, I see "you can go the CSV/XML route... but DON'T!". So now I'm looking for explanations of why. Point me to some good references.

Also, to give you a better idea of what I'm up to: In my database I have a Function table that will have a list of (x,y) pairs. (The table will also have other information that is of no consequence for our discussion.) I will never need to see part of the list of (x,y) pairs. Rather, I will take all of them and plot them on the screen. I will allow the user to drag the nodes around to change the values occasionally or add more values to the plot.

This question is related to sql linq linq-to-sql database-design linq-to-entities

The answer is


If you really wanted to store it in a column and have it queryable a lot of databases support XML now. If not querying you can store them as comma separated values and parse them out with a function when you need them separated. I agree with everyone else though if you are looking to use a relational database a big part of normalization is the separating of data like that. I am not saying that all data fits a relational database though. You could always look into other types of databases if a lot of your data doesn't fit the model.


you can store it as text that looks like a list and create a function that can return its data as an actual list. example:

database:

 _____________________
|  word  | letters    |
|   me   | '[m, e]'   |
|  you   |'[y, o, u]' |  note that the letters column is of type 'TEXT'
|  for   |'[f, o, r]' |
|___in___|_'[i, n]'___|

And the list compiler function (written in python, but it should be easily translatable to most other programming languages). TEXT represents the text loaded from the sql table. returns list of strings from string containing list. if you want it to return ints instead of strings, make mode equal to 'int'. Likewise with 'string', 'bool', or 'float'.

def string_to_list(string, mode):
    items = []
    item = ""
    itemExpected = True
    for char in string[1:]:
        if itemExpected and char not in [']', ',', '[']:
            item += char
        elif char in [',', '[', ']']:
            itemExpected = True
            items.append(item)
            item = ""
    newItems = []
    if mode == "int":
        for i in items:
            newItems.append(int(i))

    elif mode == "float":
        for i in items:
            newItems.append(float(i))

    elif mode == "boolean":
        for i in items:
            if i in ["true", "True"]:
                newItems.append(True)
            elif i in ["false", "False"]:
                newItems.append(False)
            else:
                newItems.append(None)
    elif mode == "string":
        return items
    else:
        raise Exception("the 'mode'/second parameter of string_to_list() must be one of: 'int', 'string', 'bool', or 'float'")
    return newItems

Also here is a list-to-string function in case you need it.

def list_to_string(lst):
    string = "["
    for i in lst:
        string += str(i) + ","
    if string[-1] == ',':
        string = string[:-1] + "]"
    else:
        string += "]"
    return string

I'd just store it as CSV, if it's simple values then it should be all you need (XML is very verbose and serializing to/from it would probably be overkill but that would be an option as well).

Here's a good answer for how to pull out CSVs with LINQ.


In addition to what everyone else has said, I would suggest you analyze your approach in longer terms than just now. It is currently the case that items are unique. It is currently the case that resorting the items would require a new list. It is almost required that the list are currently short. Even though I don't have the domain specifics, it is not much of a stretch to think those requirements could change. If you serialize your list, you are baking in an inflexibility that is not necessary in a more-normalized design. Btw, that does not necessarily mean a full Many:Many relationship. You could just have a single child table with a foreign key to the parent and a character column for the item.

If you still want to go down this road of serializing the list, you might consider storing the list in XML. Some databases such as SQL Server even have an XML data type. The only reason I'd suggest XML is that almost by definition, this list needs to be short. If the list is long, then serializing it in general is an awful approach. If you go the CSV route, you need to account for the values containing the delimiter which means you are compelled to use quoted identifiers. Persuming that the lists are short, it probably will not make much difference whether you use CSV or XML.


I was very reluctant to choose the path I finally decide to take because of many answers. While they add more understanding to what is SQL and its principles, I decided to become an outlaw. I was also hesitant to post my findings as for some it's more important to vent frustration to someone breaking the rules rather than understanding that there are very few universal truthes.

I have tested it extensively and, in my specific case, it was way more efficient than both using array type (generously offered by PostgreSQL) or querying another table.

Here is my answer: I have successfully implemented a list into a single field in PostgreSQL, by making use of the fixed length of each item of the list. Let say each item is a color as an ARGB hex value, it means 8 char. So you can create your array of max 10 items by multiplying by the length of each item:

ALTER product ADD color varchar(80)

In case your list items length differ you can always fill the padding with \0

NB: Obviously this is not necessarily the best approach for hex number since a list of integers would consume less storage but this is just for the purpose of illustrating this idea of array by making use of a fixed length allocated to each item.

The reason why: 1/ Very convenient: retrieve item i at substring i*n, (i +1)*n. 2/ No overhead of cross tables queries. 3/ More efficient and cost-saving on the server side. The list is like a mini blob that the client will have to split.

While I respect people following rules, many explanations are very theoretical and often fail to acknowledge that, in some specific cases, especially when aiming for cost optimal with low-latency solutions, some minor tweaks are more than welcome.

"God forbid that it is violating some holy sacred principle of SQL": Adopting a more open-minded and pragmatic approach before reciting the rules is always the way to go. Else you might end up like a candid fanatic reciting the Three Laws of Robotics before being obliterated by Skynet

I don't pretend that this solution is a breakthrough, nor that it is ideal in term of readability and database flexibility, but it can certainly give you an edge when it comes to latency.


Simple answer: If, and only if, you're certain that the list will always be used as a list, then join the list together on your end with a character (such as '\0') that will not be used in the text ever, and store that. Then when you retrieve it, you can split by '\0'. There are of course other ways of going about this stuff, but those are dependent on your specific database vendor.

As an example, you can store JSON in a Postgres database. If your list is text, and you just want the list without further hassle, that's a reasonable compromise.

Others have ventured suggestions of serializing, but I don't really think that serializing is a good idea: Part of the neat thing about databases is that several programs written in different languages can talk to one another. And programs serialized using Java's format would not do all that well if a Lisp program wanted to load it.

If you want a good way to do this sort of thing there are usually array-or-similar types available. Postgres for instance, offers array as a type, and lets you store an array of text, if that's what you want, and there are similar tricks for MySql and MS SQL using JSON, and IBM's DB2 offer an array type as well (in their own helpful documentation). This would not be so common if there wasn't a need for this.

What you do lose by going that road is the notion of the list as a bunch of things in sequence. At least nominally, databases treat fields as single values. But if that's all you want, then you should go for it. It's a value judgement you have to make for yourself.


You can just forget SQL all together and go with a "NoSQL" approach. RavenDB, MongoDB and CouchDB jump to mind as possible solutions. With a NoSQL approach, you are not using the relational model..you aren't even constrained to schemas.


What I have seen many people do is this (it may not be the best approach, correct me if I am wrong):

The table which I am using in the example is given below(the table includes nicknames that you have given to your specific girlfriends. Each girlfriend has a unique id):

nicknames(id,seq_no,names)

Suppose, you want to store many nicknames under an id. This is why we have included a seq_no field.

Now, fill these values to your table:

(1,1,'sweetheart'), (1,2,'pumpkin'), (2,1,'cutie'), (2,2,'cherry pie')

If you want to find all the names that you have given to your girl friend id 1 then you can use:

select names from nicknames where id = 1;

Only one option doesn't mentioned in the answers. You can de-normalize your DB design. So you need two tables. One table contains proper list, one item per row, another table contains whole list in one column (coma-separated, for example).

Here it is 'traditional' DB design:

List(ListID, ListName) 
Item(ItemID,ItemName) 
List_Item(ListID, ItemID, SortOrder)

Here it is de-normalized table:

Lists(ListID, ListContent)

The idea here - you maintain Lists table using triggers or application code. Every time you modify List_Item content, appropriate rows in Lists get updated automatically. If you mostly read lists it could work quite fine. Pros - you can read lists in one statement. Cons - updates take more time and efforts.


If you need to query on the list, then store it in a table.

If you always want the list, you could store it as a delimited list in a column. Even in this case, unless you have VERY specific reasons not to, store it in a lookup table.


I think in certain cases, you can create a FAKE "list" of items in the database, for example, the merchandise has a few pictures to show its details, you can concatenate all the IDs of pictures split by comma and store the string into the DB, then you just need to parse the string when you need it. I am working on a website now and I am planning to use this way.


Many SQL databases allow a table to contain a subtable as a component. The usual method is to allow the domain of one of the columns to be a table. This is in addition to using some convention like CSV to encode the substructure in ways unknown to the DBMS.

When Ed Codd was developing the relational model in 1969-1970, he specifically defined a normal form that would disallow this kind of nesting of tables. Normal form was later called First Normal Form. He then went on to show that for every database, there is a database in first normal form that expresses the same information.

Why bother with this? Well, databases in first normal form permit keyed access to all data. If you provide a table name, a key value into that table, and a column name, the database will contain at most one cell containing one item of data.

If you allow a cell to contain a list or a table or any other collection, now you can't provide keyed access to the sub items, without completely reworking the idea of a key.

Keyed access to all data is fundamental to the relational model. Without this concept, the model isn't relational. As to why the relational model is a good idea, and what might be the limitations of that good idea, you have to look at the 50 years worth of accumulated experience with the relational model.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to linq

Async await in linq select How to resolve Value cannot be null. Parameter name: source in linq? What does Include() do in LINQ? Selecting multiple columns with linq query and lambda expression System.Collections.Generic.List does not contain a definition for 'Select' lambda expression join multiple tables with select and where clause LINQ select one field from list of DTO objects to array The model backing the 'ApplicationDbContext' context has changed since the database was created Check if two lists are equal Why is this error, 'Sequence contains no elements', happening?

Examples related to linq-to-sql

Understanding SQL Server LOCKS on SELECT queries how to update the multiple rows at a time using linq to sql? LINQ: combining join and group by LINQ to SQL: Multiple joins ON multiple Columns. Is this possible? How to Select Min and Max date values in Linq Query How to do a LIKE query with linq? How to do a join in linq to sql with method syntax? How to store a list in a column of a database table Returning IEnumerable<T> vs. IQueryable<T> Entity Framework VS LINQ to SQL VS ADO.NET with stored procedures?

Examples related to database-design

What are OLTP and OLAP. What is the difference between them? How to create a new schema/new user in Oracle Database 11g? What are the lengths of Location Coordinates, latitude and longitude? cannot connect to pc-name\SQLEXPRESS SQL ON DELETE CASCADE, Which Way Does the Deletion Occur? What are the best practices for using a GUID as a primary key, specifically regarding performance? "Prevent saving changes that require the table to be re-created" negative effects Difference between scaling horizontally and vertically for databases Using SQL LOADER in Oracle to import CSV file What is cardinality in Databases?

Examples related to linq-to-entities

How to add a default "Select" option to this ASP.NET DropDownList control? LEFT JOIN in LINQ to entities? How to get first record in each group using Linq Linq to Entities join vs groupjoin The specified type member 'Date' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties Linq where clause compare only date value without time value The specified type member is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported SQL to Entity Framework Count Group-By LINQ to Entities does not recognize the method The cast to value type 'Int32' failed because the materialized value is null