Network partitions and nosql

Many distributed noSql datastores claim support for consistency or availability.

From the CAP theorem we know that a distributed system under partition (P) can only be Consistent or Available but not both at the same time


 Partitions: Why care about them?

Because they can happen at any time, with any distributed deployment.

The Internet is not just a big LAN and some regions can at time get disconnected randomly

Network partitions can also happen within the same datacentre as well. Hardware can always fail within the same LAN (or Availability Zone) and not caring about network topology changes is simply a bad practice. Full stop.

Ideally you would want a datastore to never lose writes (CP) under any circumstance. No business is keen on loosing data after all.

It’s puzzling to see how hard this is to achieve when our datastore uses some kind distributed architecture.

Most nosql systems tend to privilege availability over consistency where possible.

Once a partition has occurred things tend to get messy: a datastore can:

  1. refuse writes by switching into read only mode in the minority partition.
  2. accept writes on both sides and then choose which side to keep and which writes to throw away (Last-Writer-Wins)
  3. accept writes on both sides and try to reconcile them all afterwards.

Many distributed datastores claim never to lose data and yet un-openly implement the LWW approach (#2) or a sort of hybrid between #1 and #2 trying to reduce the loss windows

This is because High Availability is usually a design goal and a selling point for distributed datastores. Compromising on HA to preserve consistency usually has serious speed implications which are deemed unacceptable.

Still: losing writes may or may not be an option for a specific application

Approach #3 is very appealing but only works for a specific class of applications like real time collaboration tools (i.e. Google Docs): if all updates are commutative they can be merged without conflicts. Roughly speaking this means they are CRDT

 So where do we stand?

Aphyr has done an outstanding job in verifying the claims of many well regarded distributed datastores in regards to consitency or availability behaviour under partitions. Have a look at his blog. It’s time well spent.

 Systems that claim to be CP but are only AP (at best)

This is a discomforting list :(

 AP systems that don’t claim to be CP

Storing financial transactions in Redis is probably a bad idea.

 Verified CP systems

 What can we do

Most blog posts about partitioning finish up with something like

partitions can happen so be prepared

Yes but how?

Reading through Twillio’s experience one lesson I can bring home is that if you need a CP system and all you have is an AP system (Redis) replicate your writes to the AP system to a backup CP system. This won’t give you an automatic fix if the AP system gets partitioned but you can at least manually rebuild the data when the storm ends.

Not pretty but workable.

 What about reads?

If stale reads are not a problem, we have no problem.: different datastores behave differently in regards to read consistency and you should follow their documentation.

It is worth noting that strong read consistency is needed more often than not: distributed counters or a distributed locks simply don’t work with stale reads.

A wonderful answer on Quora on this topic


Now read this

Email deliverability

The Simple Mail Transfer Protocol permits any computer to send email claiming to be from any source address. It’s been conceived for a very trusting environment, not exactly what’s the internet today… Welcome spam Over time the amount of... Continue →