11 March 2016

Exploring Russian Doll Caching

This technique was developed in the Ruby community and is a great way to approach caching partial views. In Ruby rendering views is more expensive than PHP, but this technique is worth understanding as it could be applied to data models and not just views.

In the Ruby world Russian Doll caching is synonymous with key-based expiration caching.  I think it's useful to rather view the approach as being the blend of two ideas.  That's why I introduce key-based expiration separately.

Personally I think Russian Dolls are a bit of a counter-intuitive analogy.  Real life Russian Dolls each contain one additional doll, but the power of this technique rests on the fact that "dolls" can contain many other "dolls".  I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Cache Invalidation

I use Laravel which luckily allows the use of tagging cache entries as a way of grouping them.  I started the habit of tagging my cache keys with the name of the model, then whenever I update the model I invalidate the tag, which clears out all the related keys for that model.

In the absence of the ability to tag keys the next best approach to managing cache invalidation is to use key-based expiration.

The idea behind key-based expiration is to change your key name by adding a timestamp.  You store the timestamp separately and fetch it whenever you want to fetch the key.

If you change the value in the key then the timestamp changes.  This means the key name changes and so does the stored timestamp.  You'll always be able to get the most recent value, and Memcached or Redis will handle expiring the old key names.

The practical effect of this strategy is that you must change your model to update the stored timestamp whenever you change the cache.  You also have to retrieve the current timestamp whenever you want to get something out of the cache.

Nested view fragments, nested cache structure

Typically a page is rendered as a template which is filled out with a view.  Blocks are inserted into the view as partial views, and these can be nested.

The idea behind Russian Doll caching is to cache each nested part of the page.  We use cache keys that mimic the frontend nesting.

If a view fragment cache key is invalidated then all of the wrapping items keys are also invalidated.  We'll look at how to implement this in a moment.

The wrapping items are invalidated, so will need to be rendered, but the *other* nested fragments that have not changed still remain in the cache and can be reused.  This means that only part of the page will need to be rendered from scratch.

I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Implementing automatically busting containing layers

We can see that the magic of Russian Doll caching lies in the ability to bust the caches of the wrapping layers.  We'll use key-based expiration together with another refinement to implement this.

The actual implementation is non-trivial and you'll be needing to write your own helper class.  There are Github projects like Corollarium which implement Russian Doll caching for you.

In any case lets outline the requirements.

Lets have a two level cache, for simplicity, that looks like this:

Parent (version 1)
- Child (version 1)
- Child (version 1)
- Child (version 1)

I've created a basic two tier cache where every item is at version 1, freshly generated.  Expanding this to multiple tiers requires being able to let children nodes act as parents, but while I'm busy talking through this example lets constrain ourselves to just having one parent and multiple children.

Additional cache storage needs

First lets define our storage requirements.

We want keys to be automatically invalidated when they are updated and key-based expiration is the most convenient way to accomplish this.

This means that we'll have a value stored for each of them that holds the most recent value.  Currently all of these values are "version 1".

In addition to storing the current version of each key we will also need to store and maintain a list of dependencies for the key.  These are cache items which the key is built from.

We need to be certain that the dependencies have not changed since our current item was cached.  This means that our dependency list must store the version that each dependency was at when the current key was generated.

The parent node will need to store its list of dependencies and the version that they were when it was cached.  When we retrieve the parent key we need to check its list of dependencies and make sure that none of them have changed.

Putting it together

Now that we've stored all the information we need to manage our structure, lets see how it works.

Lets say that one of the children changes and is now version 2.  We update the key storing its most current value as part of the update to the value, using our key based expiration implementation.

On the next page render our class will try to pull the parent node from cache.  It first inspects the dependency list and it realises that one of the children is currently on version 2 and not the same version it was when the parent was cached.

We invalidate the parent cache object when we discover a dependency has changed.  This means we need to regenerate the parent.  We may want to implement a dogpile lock for this, if you're expecting concurrency on the page.

Only the child that has changed needs to be regenerated, and not the other two.  So the parent node can be rebuilt by generating one child node and reading the other two from cache.  This obviously results in a much less expensive operation.