System Design primer

System Design primer
Photo by Sven Mieke on Unsplash

System design interviews are becoming more and more common among top tier software companies inside and outside Silicon Valley.

Given that highly scalable systems are one of my areas of interest, in the last few months I had the opportunity to review what preparation material is available online - videos, tutorials and blog posts.

Not surprisingly, a lot of free and paid material is indeed available.

After watching a considerable number of lengthy youtube videos (> 1.5hrs each) I started to realise how most of these guides can feel overwhelming and even give the wrong impression that solution architecture for scalable systems is an incredibly complicated field with an enormous amount of knowledge required.

As I have been in this space in one way or another in the last 10+ years, I can’t disagree more:

the techniques to design efficient and modern software architectures are really only a handful and mastering them is definitely not an insurmountable task.

So I have decided to put together this non-definitive guide as a curated reference to help anyone getting started.

For every topic I will provide a brief explanation and focus on the tradeoffs and pitfalls that are often not covered by other material online and thus overlooked by more junior developers.

I will also refrain from introducing the usual plethora of diagrams, lines and boxes that make every concept easier to understand but also - in my opinion - to forget and will try and offer conversational descriptions that are simple enough to follow without diagrams.

asking the right questions

Solution architecture is about providing solutions that meet business requirements.

Yet the most common mistake is to jump into solution mode without having enough knowledge of the functional and non functional requirements of the system you are trying to design.

Needless to say, knowledge comes from asking the right questions and at start what you should be asking goes always along the lines of the following list:

  1. How is the system going to work?
  2. Who are the users?
  3. How often do they use the system?
  4. Is the traffic considered constant or intermittent/spiky over time?
  5. What’s the size of the data we need to store?
  6. What’s the required performance?
  7. What compromises are acceptable to the user?
  8. What are the availability/SLA requirements?
  9. Do we really need high availability?

These questions are used to determine the high level functional requirements - how the system works - as well as the typical non functional ones - scalability, performance and availability.

They are not optional: you will immediately fail if you don’t ask these kind of questions as a first step.

system APIs

After getting a general idea of how the system should operate the next logical step is to sketch down the system APIs.

By system APIs I mean a pseudo-code simplified view of how to interact with the system you are designing. In other words, by defining inputs, methods and outputs you define the contract between the system and the rest of the world.

You can think of the system APIs like a high level abstraction on which it’s possible to derive any RESTful or GraphQL HTTP APIs or the library interface if you - like me - are old enough to remember how to define an embeddable library.

For example if we were designing a file storage system we would - at a minimum - define the following APIs

This step is relatively simple and there are no common gotchas nor pitfalls: all you need to know is how the system works at its core.

the simple option first

The best designs start very simple first and add complexity only as needed, as the discussion progresses and the requirements become less vague.

The most common design mistake is to start with a complex design upfront.

There are a lot of IT professionals - more or less senior - that tend to tend to design by-the-book over engineered architectures to cater for prematurely defined infinite scalability requirements.

While these kinds of designs are technically correct, they are also complicated to implement and frankly not needed.

This is an insidious type of premature optimisation: it’s simply impossible to optimize a system with too many unknown unknowns towards an ideal of performance or availability that is not known yet.

In the vast majority of cases, the simple option is to design a system as one single API / backend server plus one single database server - if a database is needed - and nothing else.

design considerations

Once the functional requirements are covered and a basic design is in place you will be usually asked to start iterating over your design as the interviewer provides more and more information.

This is the time to start investigating the system’s unique characteristics:

  1. Is it read heavy?
  2. Is it write heavy?
  3. Is it computational heavy (aggregations/stats)?

This is where multiple options become available and there isn’t a one size fits all solution.

At this stage most online courses present and discuss various typical architectures in detail - like designing a URL shortener or redbuilding Slack or Twitter etc - which is a nice way to tackle system design if you have an infinite amount of hours to dedicate to this.

In this post I would like instead to cover a few cardinal concepts that are used extensively across any highly scalable system.

What follows is therefore a glossary of the most significant approaches you should be familiar with. I won’t delve into too many details - you can find lots of details online - but rather focus on the drawbacks and limitations of each approach because this is the area you will be tested upon.

1) data caching & replication

Caching is one of the most commonly used patterns to support read heavy workloads and most developers are in one way or another familiar with this concept.

The origins of this approach are to be traced back to how relational database systems were originally designed, i.e. with a strong focus on durability and writing all data to disk - which is notoriously slow - and therefore provided out of the box limited to no built in transparent in memory caching.

This created the need to cache the data in memory at an application level.

Given that reading from memory is still 2 to 3 orders of magnitude faster than reading from the fastests SSDs available and that the cost of memory has been steadily dropping for the last 20+ years, this approach is still very relevant nowadays.

In-memory storage and computation is a key element to achieve scalability for both reads and writes: no modern system with millions of users would exist without in-memory technologies.

Multiple caching strategies are available and you should definitely familiarize yourself with the most common ones: cache aside, read through, write around, write through and write behind.

caching drawbacks

Caching seems like a relatively simple solution but there are at least a couple of significant drawbacks to consider - and point out during the interview.

#1 single source of truth

To start the existence of cached data violates the single-source-of-truth principle.

Having multiple versions of mutable data introduces the possibility of some of them becoming out-of-date - leading to inconsistent reads. In most cases this is not a huge issue but sometimes it is - depending on the specific use-cases.

#2 cache invalidation

Also given that memory is still significantly more expensive than disk space, we simply can’t store everything in it and we need to use some sort of caching invalidation technique to make sure that what we are keeping in memory is the most relevant data.

The problem is that there are many such techniques - all with unique PROs and CONs - and they are all notoriously difficult to implement correctly from scratch.

For the purpose of the interview being familiar with the LRU / LFU methods is most of the time sufficient as they are the most commonly used.

Above all a great developer will highlight the perils of home-made cache invalidation and would always recommend using battle proven caching libraries and services over DIY approaches.

2) data sharding

If caching is as old as computer science itself, the concept of data sharding - albeit not new - became popular only 10/15 years ago as the most common way to deal with the explosion of user generated content on the Internet.

Sharding a dataset means splitting it according to a specific policy across multiple servers and is usually employed in conjunction with replication in the design of distributed databases or file systems.

To better understand this concept, let’s consider the hypothetical need to store a huge list of billions of yellow pages entries. If the dataset cannot easily fit on a single server, we need to split it across multiple machines using a specific sharding key - for example the business name - so that for example we end up storing [A-M] entries separately to the [N-Z] ones.

While caching is fiddly, sharding is definitely non trivial and introduces a totally different level of complexity into your architecture.

sharding drawbacks

#1 operational complexity

Even in the over-simplified example provided above it’s easy to see how finding an appropriate sharding key is not an easy feat. For example we could have way more businesses starting with the letters [A-M] than [N-Z] and our shards may not be balanced in terms of reads and/or writes.

Also as the data changes, grows or shrinks over time and querying patterns evolve, resharding techniques to rebalance the distribution of data among the shards become necessary and there is no real automatic way to do this so human and painfully manual intervention is required.

#2 sharding proxies

While there are no sharding turnkey solutions it’s common practice to abstract away the existence of the sharding itself from the application server by introducing a sharding-aware proxy layer between the application and the database.

A typical example of this approach is the Vitess open source project - created by Youtube to make Mysql work at their scale. Vitess presents itself to the application server as a single unsharded infinitely scalable Mysql database and offers the ops team an operational platform to manage the underlying mysql instances independently from the application logic.

Once again sharding proxies are no easy solution to the problem as they complicate your architecture.

During a system design interview it’s important to understand the high level concepts around sharding and being able to enumerate possible sharding techniques that can work with the system you have been asked to design while articulating logical pros and cons of each option.

3) load balancing

Load balancing is another common approach to support scalability and as the name suggests - is about sharing - or balancing - incoming web traffic among a cluster of application servers.

Up until the early 2000s load balancing was mostly carried out using expensive hardware appliances. With the advent of the cloud, DNS based software load balancers have since become the norm.

The web as we know it today would simply not exist without software load balancers so having a clear and deep understanding of how they work is a must to design highly scalable systems.

Luckily their functionality is quite simple to describe: DNS based load balancers leverage domain name resolution with small timeouts to resolve a domain name to a pool of IP addresses using several techniques to distribute, redirect or manage the overall load balancing process - round-robin, geographical, least-connection etc.

DNS load balancers leverage the distributed and battle proven nature of the DNS infrastructure to provide high availability and data replication and avoid single points of failure: once a downstream system becomes unreachable or unhealthy, its IP is removed from the pool and subsequent DNS queries will simply not return it.

Obviously this approach works only if the timeouts are kept very low so the tradeoff is an increased load on the DNS infrastructure.

load balancing drawbacks

#1 autoscaling

Autoscaling has been highly publicized by Amazon Web Services to create load balancing systems where the number of servers associated with the load balancer grows and shrinks throughout the day in accordance with the load on the application.

While this approach is great on paper as it matches computation power allocated to an application to its needs, in practice it’s actually very hard to implement it flawlessly.

The main hurdle is to define correct autoscaling policies that manage to add more capacity early enough before your system becomes overwhelmed yet not too early so not to fall into over-provisioning.

It is in fact non trivial to understand when an application server is near saturation as the pattern of memory, CPU and bandwidth saturation in relation to the load tend to be specific to application itself, influenced by underlying technology (PHP, Java, Golang etc) and the server sizing.

More often than not, teams frustrated by the effort required have the tendency to setup autoscaling policies that rarely kick in and to keep the number of application servers over-provisioned most of the time - in effect denying the advantages of autoscaling in the first instance.

While overprovisioning may seem wasteful, it’s always important to balance the cost associated with setting up adequate autoscaling policies through trial and error.

#2 stateless backends

While DNS based load balancers are quite a mature turnkey solution, their use requires the application server logic to be stateless.

This means all the data your application depends on (including cookies) needs to be either replicated on all instances (like configuration items) or stored in the database. This is because sequential calls made by the same user may hit different servers depending on the load balancing strategies..

While in greenfield projects statelessness is not very hard to achieve, with most legacy systems instead things are more complicated and specific workarounds - like sticky sessions for example - might become necessary.

While load balancing is not a particularly complex element of a highly scalable architecture it’s always worth understanding how they really work under the hood (see the DNS reference above) and how it’s possible to shoot yourself in the foot by introducing auto-scaling techniques without adequate load/soak testing.

4) asynchronous systems

This is one of the most important topics in system design.

Simply put: synchronous systems accept and carry out state changes immediately, asynchronous systems accept state changes synchronously but queue up state changes for later processing.

The advent of SAAS queueing options and cloud tools have made asynchronous systems more and more accessible as they make for logically simple architectures where the data flows nicely between upstream and downstream systems.

However most systems should be first designed to work synchronously because of the underlying simplicity of this approach.

As a rule of thumb, it makes sense to carry out tasks asynchronously only when they can take a significant and variable amount of time to complete - otherwise it’s much easier and recommended to stick to the synchronous approach. In other words, simple updates to the database should be carried out synchronously.

But with I/O bound tasks that move data across systems or bulk updates or data exports instead, the asynchronous approach generally provides better user experience and above all allows for sophisticated retry and error handling mechanisms.

While designing asynchronous architectures it’s important to remember that at scale failure is inevitable and architecting systems with fault tolerance in mind is key. In this context having idempotent tasks - i.e. that can be safely retried indefinitely - is usually a baseline requirement.

It’s also important to make sure that you are familiar with modern queuing technologies like rabbitmq and Apache Kafka in particular and their cloud implementations (respectively Amazon SQS and Amazon Kinesis) and with the various retry strategies.

asynchronous systems drawbacks

#1 operational complexity

Asynchronous systems are by definition more complex than synchronous ones as they are made of more moving parts. Each component in the data flow requires precise monitoring and failure handling strategies - like dead letter queues - that need to be tailored to the specific domain of your application.

There simply isn’t a one size fits all solution to handle this:

These are only an example of the questions that you should ask yourself and your interviewer.

It’s also worth mentioning that coding, testing and debugging asynchronous systems is significantly harder than simpler ones - just consider for a moment how hard it is to run a Kafka cluster on your development machine - so this will negatively affect the developer productivity and experience.

#2 no edge cases

Asynchronous systems move the data generally from upstream to downstream systems across multiple hops.The greater the number of hops and moving parts, the more thinking is needed to cater for all possible scenarios.

At every step anything that can go wrong will and failure will manifest itself in every possible way. It is quite tempting to label these incidents as edge cases.

On the contrary a great design is able to deal with almost all failure scenarios in the most automated way possible, degrading gracefully when needed.

In other words: there cannot be any edge case you haven’t thought of so try and focus on what can go wrong while discussing the architecture with your interviewer.

#3 user feedback

Data validation and state change visibility are two key elements to consider when handling asynchronous requests.

It’s not unusual to encounter situations where user requests are accepted and processed asynchronously without offering the end user visibility of the state of these pending requests until they are finally processed.

This has a significant impact on the user experience as most people expect to see the impact of a state change they requested reflected immediately in the user interface. Therefore it’s necessary to cater for a visible pending state for changes that are not being completed yet. This is not very hard to do at design time, yet many systems are not designed to cater for these scenarios - try for example to take a volume snapshot in the AWS console - and this lack of customer focus leads to subpar UX.

Validation should also happen upfront and synchronously, to avoid silent rejections that would lead to an even worse user experience.

Defining what parts of your system are designed to work synchronously and what parts need queueing is by far the most important architectural decision you will be asked to make. Virtually any system at scale will require some level of asynchronicity, queuing and increased complexity.

Unfortunately there are no silver bullets here and it’s crucial to be able to point out the PROs and CONs of each choice with your interviewer.

5) system observability

This topic is not directly related to software architecture but is so crucial to modern service oriented software development that it always comes up one way or another during a system design interview.

Distributed architectures are what makes systems scalable but they are also harder to monitor and observe compared to monolithic ones. To address this problem, in 2016 a few SREs working at Google published what is now famously known as the Site Reliability Engineering book.

Among other already familiar topics they introduced the concept of the four golden signals of system observability which are:

  1. Latency: on average ms
  2. Traffic: the number of req/sec that are being handled by the system
  3. Error rate: the number of req/sec resulting in errors
  4. and Saturation: the % of CPU/bandwidth/memory currently used by the system

What made this list quite famous is that the key to monitor any distributed system sits with a combination of these four easy to remember signals and while this kind of knowledge was once considered out of scope for software engineers this is no longer the case with distributed scalable systems.


While this blog post is a bit long yet maybe not totally complete from every angle, there really are only a handful of topics to be able to master and nail the system design interview.

If you are tasked to design and scale a system, always come back to this list of topics: solution architecture is a finite field which is definitely not impossible to master.


comments powered by Disqus

Subscribe to my newsletter