Evolving CRUD (part 2)
Serving content at scale without complexity
Command Querying Responsibility Segregation #
Martin Fowler and Udi Dahan have in the past described this approach at length. This is obviously not a new concept and what follows is only a possible implementation of it.
A fancy name that hides a simple concept: read traffic (GET) has different requirements compared to the write traffic (POST/PUT/DELETE) and should be handled possibly by different systems.
This is by far not a new concept. Already in production with big players but never standardised.
This design splits read and write requests according to the following philosophy:
Read requests (GET) #
- Redis based for high 100k tps
- Uses advanced data structures to model data and indexes
- A proper noSQL schema design from the grounds up
Write requests (POST/UPDATE/DELETE) #
- Data ingestion is happens in a reliable queue
- Suggested queuing technologies include Redis itself or Apache Kafka or Amazon Kinesis or SQS
- Ingestion and syndication tasks are handled in the same fashion
- Task idempotency is a desirable feature
- Updates are always eventually consistent (more on this below)
The concept of message queues is not new and is at the very heart of the Enterprise Integration Patterns concepts, minus all the bloatware.
In a CQRS system eventual consistency for upsert operations is a fact.
This may not always be desirable: not everything can be done later, something needs to be done upfront in order to return necessary information to the user (ids, errors if any, etc)
Writing to the serving layer synchronously is a small variation that can work well with Redis given its high read and write throughput.
Advantages of this approach #
No more Cron jobs #
Cron jobs and scheduled tasks are no longer necessary as background tasks are run on demand when needed rather than at a time based schedule.
Automatic Error Handling #
Messages are continuously reprocessed until no error is encountered.
This introduces resiliency in they way data is syndicated to 3rd party platforms, which can be down or not always available.
Task idempotency is crucial to this end
Push approach to caching #
Instead of calculating the views on demand we now precompute them at ingestion time. This allows us to scale out writes by beefing up our queue processing cluster.
External system integration #
Syndication and data ingestion are handled transparently in the same way
- search engine data ingestion
- sending emails
- processing transactions
- integration with 3rd party systems
- storing data in long term storage facilities
Scale out more easily #
By dividing internet facing read and write requests from queue task we can now scale them independently.
In the most common scenarios we can simply resort to single threaded task processing depending on the ingestion traffic and this can greatly simplify our code.
Also as tasks are processed in groups smart optimization techniques can be easily applied: for example when multiple updates to a resource are sent in a series, only the last one can be applied and the others discarded.
Disavantages of this approach #
Data loss is possible #
- Redis is AP (Available under partition) and not CP (Consistent under partition)
- Things don’t change if we use Redis Cluster or Redis Sentinel so we may still lose data under partition
- Partitions are not common but never totally avoidable.
- We need a long term storage solution for Disaster Recovery
Limited by the Redis memory #
- In the serving layer we are limited by the amount of memory available in Redis
- Memory is getting cheaper and easy to use sharding solutions are already available (Redis Cluster)
- It’s a good approach to use Redis to store indexes and short term caching while relying on a highly durable key/value datastore like S3 for bulky data items.
Non trivial data modelling in the serving layer #
- Most of the performance gain of the system stems from the performance optimization work done upfront when modelling the Redis data model
- This can be puzzling at first for most developer as they are forced to think in terms of Big O notation again since Uni.
Eventual consistency #
- IDs and errors must be reported to the user synchronously
- Everything else is eventually consistent by definition, although hybrid solutions are possible and sometimes (rarely) desirable.
Integral error #
- NoSQL schemas are easy to extend as long as only unindexed fields are added. Adding a new indexed field requires us to reprocess the whole data set using ad hoc scripts.
- Sometimes bug in our sw layer will introduce data inconsistencies that require re-mediation work
- This is an eventuality we always need to cater for
Bottom line #
This approach is suitable when
- it’s already clear that some sorts of in memory caching will be needed and we don’t want to resort to use Memcache as an afterthought
- the system must be designed to handle significant traffic loads since day one (> 100 req/sec)
- integration with 3rd party systems is needed for search, notification and data warehousing.