Redis strings vs Redis hashes to represent JSON efficiency

Question

I want to store a JSON payload into redis  There s really 2 ways I can do this    One using a simple string keys and values  key user  value payload  the entire JSON blob which can be 100-200 KB   SET user 1 payload Using hashes  HSET user 1 username  someone  HSET user 1 location  NY  HSET user 1 bio  STRING WITH OVER 100 lines    Keep in mind that if I use a hash  the value length isn t predictable  They re not all short such as the bio example above   Which is more memory efficient  Using string keys and values  or using a hash

User · Answer

Some additions to a given set of answers:

First of all if you going to use Redis hash efficiently you must know a keys count max number and values max size - otherwise if they break out hash-max-ziplist-value or hash-max-ziplist-entries Redis will convert it to practically usual key/value pairs under a hood. ( see hash-max-ziplist-value, hash-max-ziplist-entries ) And breaking under a hood from a hash options IS REALLY BAD, because each usual key/value pair inside Redis use +90 bytes per pair.

It means that if you start with option two and accidentally break out of max-hash-ziplist-value you will get +90 bytes per EACH ATTRIBUTE you have inside user model! ( actually not the +90 but +70 see console output below )

 # you need me-redis and awesome-print gems to run exact code
 redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new 
  => #<Redis client v4.0.1 for redis://127.0.0.1:6379/0> 
 > redis.flushdb
  => "OK" 
 > ap redis.info(:memory)
    {
                "used_memory" => "529512",
          **"used_memory_human" => "517.10K"**,
            ....
    }
  => nil 
 # me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )    
 # txt is some english fictionary book around 56K length, 
 # so we just take some random 63-symbols string from it 
 > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
 => :done 
 > ap redis.info(:memory)
  {
               "used_memory" => "1251944",
         **"used_memory_human" => "1.19M"**, # ~ 72b per key/value
            .....
  }
  > redis.flushdb
  => "OK" 
  # setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte 
  > redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done 
  > ap redis.info(:memory)
   {
               "used_memory" => "1876064",
         "used_memory_human" => "1.79M",   # ~ 134 bytes per pair  
          ....
   }
    redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
    ap redis.info(:memory)
    {
             "used_memory" => "2262312",
          "used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes    
           ....
    }

For TheHippo answer, comments on Option one are misleading:

hgetall/hmset/hmget to the rescue if you need all fields or multiple get/set operation.

For BMiner answer.

Third option is actually really fun, for dataset with max(id) < has-max-ziplist-value this solution has O(N) complexity, because, surprise, Reddis store small hashes as array-like container of length/key/value objects!

But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much

But you should not worry, you'll break hash-max-ziplist-entries very fast and there you go you are now actually at solution number 1.

Second option will most likely go to the fourth solution under a hood because as question states:

Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.

And as you already said: the fourth solution is the most expensive +70 byte per each attribute for sure.

My suggestion how to optimize such dataset:

You've got two options:

If you cannot guarantee max size of some user attributes than you go for first solution and if memory matter is crucial than compress user json before store in redis.
If you can force max size of all attributes. Than you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.

User · Answer

It depends on how you access the data   Go for Option 1    If you use most of the fields on most of your accesses  If there is variance on possible keys   Go for Option 2    If you use just single fields on most of your accesses  If you always know which fields are available   P S   As a rule of the thumb  go for the option which requires fewer queries on most of your use cases

User · Answer

This article can provide a lot of insight here  http   redis io topics memory-optimization  There are many ways to store an array of Objects in Redis  spoiler  I like option 1 for most use cases     Store the entire object as JSON-encoded string in a single key and keep track of all Objects using a set  or list  if more appropriate   For example   INCR id users SET user  id     name   Fred   age  25   SADD users  id    Generally speaking  this is probably the best method in most cases  If there are a lot of fields in the Object  your Objects are not nested with other Objects  and you tend to only access a small subset of fields at a time  it might be better to go with option 2   Advantages  considered a  good practice   Each Object is a full-blown Redis key  JSON parsing is fast  especially when you need to access many fields for this Object at once   Disadvantages  slower when you only need to access a single field  Store each Object s properties in a Redis hash   INCR id users HMSET user  id  name  Fred  age 25 SADD users  id    Advantages  considered a  good practice   Each Object is a full-blown Redis key  No need to parse JSON strings   Disadvantages  possibly slower when you need to access all most of the fields in an Object   Also  nested Objects  Objects within Objects  cannot be easily stored  Store each Object as a JSON string in a Redis hash   INCR id users HMSET users  id     name   Fred   age  25     This allows you to consolidate a bit and only use two keys instead of lots of keys   The obvious disadvantage is that you can t set the TTL  and other stuff  on each user Object  since it is merely a field in the Redis hash and not a full-blown Redis key   Advantages  JSON parsing is fast  especially when you need to access many fields for this Object at once  Less  polluting  of the main key namespace   Disadvantages  About same memory usage as  1 when you have a lot of Objects  Slower than  2 when you only need to access a single field  Probably not considered a  good practice   Store each property of each Object in a dedicated key   INCR id users SET user  id  name  Fred  SET user  id  age 25 SADD users  id    According to the article above  this option is almost never preferred  unless the property of the Object needs to have specific TTL or something    Advantages  Object properties are full-blown Redis keys  which might not be overkill for your app   Disadvantages  slow  uses more memory  and not considered  best practice    Lots of polluting of the main key namespace    Overall Summary  Option 4 is generally not preferred   Options 1 and 2 are very similar  and they are both pretty common   I prefer option 1  generally speaking  because it allows you to store more complicated Objects  with multiple layers of nesting  etc    Option 3 is used when you really care about not polluting the main key namespace  i e  you don t want there to be a lot of keys in your database and you don t care about things like TTL  key sharding  or whatever    If I got something wrong here  please consider leaving a comment and allowing me to revise the answer before downvoting   Thanks

[json] Redis strings vs Redis hashes to represent JSON: efficiency?

Examples related to json

Examples related to redis