All of the answers (at the time of this writing) assume each of Redis, MongoDB, and perhaps an SQL-based relational database are essentially the same tool: "store data". They don't consider data models at all.
MongoDB is a document store. To compare with an SQL-driven relational database: relational databases simplify to indexed CSV files, each file being a table; document stores simplify to indexed JSON files, each file being a document, with multiple files grouped together.
JSON files are similar in structure to XML and YAML files, and to dictionaries as in Python, so think of your data in that sort of hierarchy. When indexing, the structure is the key: A document contains named keys, which contain either further documents, arrays, or scalar values. Consider the below document.
{
_id: 0x194f38dc491a,
Name: "John Smith",
PhoneNumber:
Home: "555 999-1234",
Work: "555 999-9876",
Mobile: "555 634-5789"
Accounts:
- "379-1111"
- "379-2574"
- "414-6731"
}
The above document has a key, PhoneNumber.Mobile
, which has value 555 634-5789
. You can search through a collection of documents where the key, PhoneNumber.Mobile
, has some value; they're indexed.
It also has an array of Accounts
which hold multiple indexes. It is possible to query for a document where Accounts
contains exactly some subset of values, all of some subset of values, or any of some subset of values. That means you can search for Accounts = ["379-1111", "379-2574"]
and not find the above; you can search for Accounts includes ["379-1111"]
and find the above document; and you can search for Accounts includes any of ["974-3785","414-6731"]
and find the above and whatever document includes account "974-3785", if any.
Documents go as deep as you want. PhoneNumber.Mobile
could hold an array, or even a sub-document (PhoneNumber.Mobile.Work
and PhoneNumber.Mobile.Personal
). If your data is highly structured, documents are a large step up from relational databases.
If your data is mostly flat, relational, and rigidly structured, you're better off with a relational database. Again, the big sign is whether your data models best to a collection of interrelated CSV files or a collection of XML/JSON/YAML files.
For most projects, you'll have to compromise, accepting a minor work-around in some small areas where either SQL or Document Stores don't fit; for some large, complex projects storing a broad spread of data (many columns; rows are irrelevant), it will make sense to store some data in one model and other data in another model. Facebook uses both SQL and a graph database (where data is put into nodes, and nodes are connected to other nodes); Craigslist used to use MySQL and MongoDB, but had been looking into moving entirely onto MongoDB. These are places where the span and relationship of the data faces significant handicaps if put under one model.
Redis is, most basically, a key-value store. Redis lets you give it a key and look up a single value. Redis itself can store strings, lists, hashes, and a few other things; however, it only looks up by name.
Cache invalidation is one of computer science's hard problems; the other is naming things. That means you'll use Redis when you want to avoid hundreds of excess look-ups to a back-end, but you'll have to figure out when you need a new look-up.
The most obvious case of invalidation is update on write: if you read user:Simon:lingots = NOTFOUND
, you might SELECT Lingots FROM Store s INNER JOIN UserProfile u ON s.UserID = u.UserID WHERE u.Username = Simon
and store the result, 100
, as SET user:Simon:lingots = 100
. Then when you award Simon 5 lingots, you read user:Simon:lingots = 100
, SET user:Simon:lingots = 105
, and UPDATE Store s INNER JOIN UserProfile u ON s.UserID = u.UserID SET s.Lingots = 105 WHERE u.Username = Simon
. Now you have 105 in your database and in Redis, and can get user:Simon:lingots
without querying the database.
The second case is updating dependent information. Let's say you generate chunks of a page and cache their output. The header shows the player's experience, level, and amount of money; the player's Profile page has a block that shows their statistics; and so forth. The player gains some experience. Well, now you have several templates:Header:Simon
, templates:StatsBox:Simon
, templates:GrowthGraph:Simon
, and so forth fields where you've cached the output of a half-dozen database queries run through a template engine. Normally, when you display these pages, you say:
$t = GetStringFromRedis("templates:StatsBox:" + $playerName);
if ($t == null) {
$t = BuildTemplate("StatsBox.tmpl",
GetStatsFromDatabase($playerName));
SetStringInRedis("Templates:StatsBox:" + $playerName, $t);
}
print $t;
Because you just updated the results of GetStatsFromDatabase("Simon")
, you have to drop templates:*:Simon
out of your key-value cache. When you try to render any of these templates, your application will churn away fetching data from your database (PostgreSQL, MongoDB) and inserting it into your template; then it will store the result in Redis and, hopefully, not bother making database queries and rendering templates the next time it displays that block of output.
Redis also lets you do publisher-subscribe message queues and such. That's another topic entirely. Point here is Redis is a key-value cache, which differs from a relational database or a document store.
Pick your tools based on your needs. The largest need is usually data model, as that determines how complex and error-prone your code is. Specialized applications will lean on performance, places where you write everything in a mixture of C and Assembly; most applications will just handle the generalized case and use a caching system such as Redis or Memcached, which is a lot faster than either a high-performance SQL database or a document store.