I wanted to write an article about secrets of scalability, but it appears that this subject is too complex for one article. Instead let’s just dissect some scalability problems as we go.
When you think about scalability, it is important to distinguish two different types of problems: those that require reading much more often than updating and those that require reading as often or even less often than updating. First type of problems is called WORM (write once read many), second is called RW (read-write). It turns out that they are fundamentally different and here is why.
The requirement to scale reads is in many cases conflicting with the requirement to scale writes, unless you are okay with inconsistent reads. If you are not okay with that (which is usually the case), reads have to be coordinated with writes and it requires additional efforts.
Also, some data structures are inherently WORM-ish, especially some succinct data structures. Mutating them on the fly requires a lot of time, but retrieving might be very efficient.
There is another important difference: WORM problems are usually better optimisable than RW problems, and what is more important, they are more suitable for automatic optimization. Why? Because during read operations program state remains the same. This assumption allows you, compiler and OS to cut corners, make things faster and make them work faster.
And of course, WORM problems are automatically parallelisable. Some languages already have patterns, such as futures in Scala, especially suitable for these kind of cases, so once WO part of WORM is complete, parallelization comes at no cost.
Here is a recipe: if you are able to identify a WORM problem, single it out, perhaps making a component out of it. Functional languages, such as Scala, stimulate you to do it as you go, in order to provide more opportunities for optimisation. If an entire system can be implemented as WORM, it will be able to scale ridiculously well.
But what about RW? RW could be quite scalable too, but in a different way!
The easiest case is when there are no reads, in other words, RW turns to W. The solution here is simple: bundle and parallelise writes! This works, because write throughput is usually very high, unlike write latency!
This works with hard drives, because linear write speed is quite high. This works with modern CPUs, because they have write ports that accumulate writes and attain highest bus throughput in this way
A classic example is a log file: write throughput is high, writes are done continuously, without reading, then the data is simply moved away. After that, you can continue writing, as well as reading data that was moved away in whatever fashion you want.
Parallelization is also not very sophisticated: if records are unrelated, just push them to multiple independent channels. Otherwise doing the same but in chunks might help.
When number of reads is almost equal to number of writes or few orders of magnitude higher, contention starts to make difference. A classic example is a payment processing system: when it is being bombarded by mutating transactions, strange phenomena start to appear, like failures due to long-running queries. Explanation of solutions to such problems goes way beyond this article.
Notice how the nature of every problem dictates certain solution. For that reason it is important to identify what kind of problem is it that you are trying to solve: WORM or RW? Without knowing that you might not be able to decompose your system into components that are perfectly scalable.
Interestingly, here simplicity comes into play: if your system is simple, it is easy to make every of its components scalable. But simplification is a topic for another article…