[sql] Recommended SQL database design for tags or tagging

I've heard of a few ways to implement tagging; using a mapping table between TagID and ItemID (makes sense to me, but does it scale?), adding a fixed number of possible TagID columns to ItemID (seems like a bad idea), Keeping tags in a text column that's comma separated (sounds crazy but could work). I've even heard someone recommend a sparse matrix, but then how do the tag names grow gracefully?

Am I missing a best practice for tags?

This question is related to sql database-design tags data-modeling tagging

The answer is


I've always kept the tags in a separate table and then had a mapping table. Of course I've never done anything on a really large scale either.

Having a "tags" table and a map table makes it pretty trivial to generate tag clouds & such since you can easily put together SQL to get a list of tags with counts of how often each tag is used.


Use a single formatted text column[1] for storing the tags and use a capable full text search engine to index this. Else you will run into scaling problems when trying to implement boolean queries.

If you need details about the tags you have, you can either keep track of it in a incrementally maintained table or run a batch job to extract the information.

[1] Some RDBMS even provide a native array type which might be even better suited for storage by not needing a parsing step, but might cause problems with the full text search.


Normally I would agree with Yaakov Ellis but in this special case there is another viable solution:

Use two tables:

Table: Item
Columns: ItemID, Title, Content
Indexes: ItemID

Table: Tag
Columns: ItemID, Title
Indexes: ItemId, Title

This has some major advantages:

First it makes development much simpler: in the three-table solution for insert and update of item you have to lookup the Tag table to see if there are already entries. Then you have to join them with new ones. This is no trivial task.

Then it makes queries simpler (and perhaps faster). There are three major database queries which you will do: Output all Tags for one Item, draw a Tag-Cloud and select all items for one Tag Title.

All Tags for one Item:

3-Table:

SELECT Tag.Title 
  FROM Tag 
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 WHERE ItemTag.ItemID = :id

2-Table:

SELECT Tag.Title
FROM Tag
WHERE Tag.ItemID = :id

Tag-Cloud:

3-Table:

SELECT Tag.Title, count(*)
  FROM Tag
  JOIN ItemTag ON Tag.TagID = ItemTag.TagID
 GROUP BY Tag.Title

2-Table:

SELECT Tag.Title, count(*)
  FROM Tag
 GROUP BY Tag.Title

Items for one Tag:

3-Table:

SELECT Item.*
  FROM Item
  JOIN ItemTag ON Item.ItemID = ItemTag.ItemID
  JOIN Tag ON ItemTag.TagID = Tag.TagID
 WHERE Tag.Title = :title

2-Table:

SELECT Item.*
  FROM Item
  JOIN Tag ON Item.ItemID = Tag.ItemID
 WHERE Tag.Title = :title

But there are some drawbacks, too: It could take more space in the database (which could lead to more disk operations which is slower) and it's not normalized which could lead to inconsistencies.

The size argument is not that strong because the very nature of tags is that they are normally pretty small so the size increase is not a large one. One could argue that the query for the tag title is much faster in a small table which contains each tag only once and this certainly is true. But taking in regard the savings for not having to join and the fact that you can build a good index on them could easily compensate for this. This of course depends heavily on the size of the database you are using.

The inconsistency argument is a little moot too. Tags are free text fields and there is no expected operation like 'rename all tags "foo" to "bar"'.

So tldr: I would go for the two-table solution. (In fact I'm going to. I found this article to see if there are valid arguments against it.)


I would suggest following design : Item Table: Itemid, taglist1, taglist2
this will be fast and make easy saving and retrieving the data at item level.

In parallel build another table: Tags tag do not make tag unique identifier and if you run out of space in 2nd column which contains lets say 100 items create another row.

Now while searching for items for a tag it will be super fast.


If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:

tagcloud: {
  map: function(doc){ 
    for(tag in doc.tags){ 
      emit(doc.tags[tag],1) 
    }
  }
  reduce: function(keys,values){
    return values.length
  }
}

Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to database-design

What are OLTP and OLAP. What is the difference between them? How to create a new schema/new user in Oracle Database 11g? What are the lengths of Location Coordinates, latitude and longitude? cannot connect to pc-name\SQLEXPRESS SQL ON DELETE CASCADE, Which Way Does the Deletion Occur? What are the best practices for using a GUID as a primary key, specifically regarding performance? "Prevent saving changes that require the table to be re-created" negative effects Difference between scaling horizontally and vertically for databases Using SQL LOADER in Oracle to import CSV file What is cardinality in Databases?

Examples related to tags

How to set image name in Dockerfile? What is initial scale, user-scalable, minimum-scale, maximum-scale attribute in meta tag? How to create named and latest tag in Docker? Excel data validation with suggestions/autocomplete How do you revert to a specific tag in Git? HTML tag <a> want to add both href and onclick working Instagram API to fetch pictures with specific hashtags How to apply Hovering on html area tag? How to get current formatted date dd/mm/yyyy in Javascript and append it to an input How to leave space in HTML

Examples related to data-modeling

What's the difference between identifying and non-identifying relationships? What's the longest possible worldwide phone number I should consider in SQL varchar(length) for phone Recommended SQL database design for tags or tagging

Examples related to tagging

How do you revert to a specific tag in Git? What is the most efficient way to store tags in a database? Recommended SQL database design for tags or tagging