jueves, 3 de mayo de 2018

Two-level hashes in Redis using LUA and MsgPack

Redis hashes are a very powerful data structure allowing you to store key-value properties associated with a given Redis key.   For example you could store for each user all it's devices and the last time they were online:

UserId1
   DeviceId1=1525038228
   DeviceId2=1525038128

But at some point maybe you need to store something more than the last time each device was online.   Maybe a name, last time offline or a status (online/offline/away/busy...).

So basically we want to store nested hashes with a structure similar like this one:

UserId1
   DeviceId1
     LastOnline=1525038228
     LastOffline=1525028228
     Status=away
   DeviceId2=1525038228
     LastOnline=1525038128
     LastOffline=1525028128
     Status=busy

Unfortunately for us this is not a structure supported out of the box in Redis, so we need to flatten it a little bit and use something like Json values to store all that information:

UserId1
   DeviceId1={"LastOnline":1525038228, "LastOffline":1525028228, "Status": "away"}
   DeviceId2={"LastOnline": 1525038128, "LastOffline": 1525028128, "Status": "busy"}

With this approach the problem is solved, but when we need to update one of those values (f.e. LastOnline of DeviceId2) we need to do a HGET plus a HSET to redis.  This is problematic because it is:
  • Slower as it requires 2 round trip times to complete the operation
  • More complex as you need to use the WATCH command to run both commands simulating a transaction to avoid race conditions
  • Less efficient because you need to receive and send the whole Json value over the network
Fortunately there are two features of Redis that combined can give us something very similar to what we need.

The first feature is the ability to execute Lua scripts as part of a Redis command and the second feature are the standard Lua modules included in latest Redis versions allowing to serialize data as Json or MsgPack formats.

In this python example you can see the Lua scripts to write and read any property in these nested hashes:


The first Lua script updates a nested field.  To do that it gets the field value with HGET, deserialize it with 'cmsgpack.unpack', then update the field, serialize it again with 'cmsgpack.pack' and stores it back with HSET.  
The second Lua script returns all the nested fields.  To do that it gets the value with HGET, deserialize it with 'cmsgpack.unpack' and converts it to a list so that it can be sent in a redis response.

Disclaimer: I haven't used Lua much in the last 10y so that can probably be simpler/cleaner.

Size

MessagePack serialization is more compact than Json.  If you check the value stored in Redis after running the previous script you get this:

127.0.0.1:6379> hgetall user_id
1) "device1"
2) "\x81\xablast_online\xceZ\xeb\x0f\x13"

We should make it even smaller with shorter key names (f.e. "on" instead of "LastOnline").

Performance

I didn't have time to do a detailed performance test but just to check if something was terribly wrong I tried setting and getting one of those nested values 10.000 times in a loop against a local server and checked the time it took:
  Option 1: Lua/MessagePack:  2.15 secs
  Option 2: Use raw Redis commands storing nested hash as Json and using a transaction (GET + SET) to update a subfield: 2.42 secs


The same idea can be implemented with custom Redis modules instead of Lua scripts.   For example that is what the ReJSON module does. That is probably a little bit faster than the Lua script approach but there are many cases where you cannot install custom Redis modules (f.e. when using managed Redis instances in  AWS)