Developers with a background in relational databases are accustomed to achieving data integrity using transactions. Once a writer updates a bank balance and commits the transaction, it’s entirely unacceptable for a reader to ever be served the previous value, and a relational database ensures that this never happens. In the NoSQL world, this is referred to as strong consistency. Achieving strong consistency in a NoSQL database is more challenging because NoSQL databases by design write to multiple replicas. And in the case of Azure Cosmos DB, these replicas can be geographically spread across multiple Microsoft data centers throughout the world.
First, let’s understand consistency within the context of a single data center.
In one Azure data center, your data is written out to multiple replicas (four at least). Consistency is all about whether or not you can be sure that the data you are reading at any point in time is – in fact – the latest version of that data, because, you can be reading from any replica at any given time. And if it’s not the latest version, then this is known as a “dirty read.”
Cosmos DB supports five consistency levels, ranging from strong to eventual. I’ll discuss each of these levels in a moment, but you can see that there’s a sliding scale here that balances latency with availability:
On the one extreme, strong consistency guarantees you’ll never experience a dirty read, but of course, this guarantee comes with a cost, because Cosmos DB can’t permit reads until all the replicas are updated. This results in higher latency, which degrades performance.
On the other end of the spectrum, there’s eventual consistency, which offers no guarantees whatsoever, and you never know with certainty whether or not you’re reading the latest version of data. This gives Cosmos DB the freedom to serve your reads from any available replica, without having to wait for that replica to receive the latest updates from other writes. In turn, this results in the lowest latency, and delivers the best performance, at the cost of potentially experiencing dirty reads.
Replication within a region
Consistency is always a concern, but practically, it only becomes really important once you start geo-replicating your data. Because the reality is that, despite these varying consistency levels, it’s extremely rare to ever experience a dirty read within a data center, even when using eventual consistency, which is the weakest consistency level. And that’s because the replicas in a data center are located very close to one another, and data moves extremely fast between them… typically within one millisecond.
So in fact, there is little difference in consistency, latency, performance, and availability within a data center, where you’ll almost never encounter a dirty read, but it’s a completely different matter once you start replicating globally.
Once you globally distribute your data, it takes hundreds of milliseconds to replicate changes across continents. This greatly increases the chances for dirty reads, which is why consistency becomes a much more pressing concern once you globally distribute your data across multiple data centers.
Five Consistency Levels
In the case of Strong Consistency, you can be certain that you’re always reading the latest data. But you pay for that certainty in performance. Because when somebody writes to the database, everybody waits for Cosmos DB to acknowledge that the write was successfully saved to all the replicas. Only then can Cosmos DB serve up consistent reads, guaranteed. Cosmos DB actually does support strong consistency across regions, assuming that you can tolerate the high latency that will result while replicating globally.
Next, there’s bounded staleness, which is kind of one step away from strong. As the name implies, this means that you can tolerate stale data, but up to a point. So you can actually configure much how out of date you want to allow for stale reads. Think of it as controlling the level of freshness, where you might have dirty reads, but only if the data isn’t too much out of date. Otherwise, for data that’s too stale, Cosmos DB will switch to strong consistency. In terms of defining how stale is stale, you can specify lag in terms of time – like, no stale data older than X, or update operations – as in, no stale data that’s been updated more than Y number of times.
And then there’s session consistency, which is actually the default consistency level. With session consistency, you maintain a session for your writer – or writers – by defining a unique session key that all writers include in their requests. Cosmos DB then uses the session key to ensure strong consistency for the writer, meaning that a writer is always guaranteed to read the same data that they wrote, and will never read a stale version from an out-of-date replica – while everyone else may experience dirty reads.
When you create a DocumentClient instance with the .NET SDK, it automatically establishes a session key and includes it with every request. This means that within the lifetime of a DocumentClient instance, you are guaranteed strong consistency for reading any data that you’ve written, when using session consistency. This is a perfect balance for many scenarios, which is why session consistency the default. Because very often, you want to sure immediately that the data you’ve written has – in fact – been written, but it’s perfectly OK if it takes a little extra time for everyone else to see what you’ve written, once all the replicas have caught up with your write.
With consistent prefix, dirty reads are always possible. However, you do get some guarantees. First, when you do get stale data from a dirty read, you can at least be sure that the data you get has in fact been updated to all the replicas, even though it’s not the most up-to-date version of that data, which has still yet to be replicated. And second, you’ll never experience reads out of order. For example, say the same data gets updated four times, so there are versions A, B, C, and D, where D is the most up to date version, and A, B, and C are stale versions. Now assume A and B have both been replicated, but C and D haven’t yet. You can be sure that, until C and D do get replicated, you’ll never read versions A and B out of order. That is, you might first get version A when only A has been replicated, and then get version B once version B has been replicated, and then you’ll never again get version A. You can only get the stale versions in order, so you’ll only continue getting version B until a later version gets replicated.
Eventual consistency is the weakest consistency level. It offers no guarantees whatsoever, in exchange for the lowest latency and best performance compared to all the other consistency levels. With eventual, you never wait for any acknowledgment that data has been fully replicated before reading. This means that read requests can be serviced by replicas that have already been updated, or by those that haven’t. So you can get inconsistent results in a seemingly random fashion, until eventually, all the replicas get updated and then all the reads become consistent – until the next write operation, of course. So this really is the polar opposite of strong consistency, because now there’s no guarantee that you’re ever reading the latest data, but there’s also never any waiting, which delivers the fastest performance.
One more thing…
It’s also worth mentioning that when bounded staleness and session consistency levels don’t apply strong consistency, they fall back to consistent prefix, not all the way down eventual consistency. So for bounded staleness, which gives you strong consistency when the data is too stale, you’ll still get consistent prefix guarantees when the data isn’t too stale. Likewise, with session, which gives you strong consistency when reading your own writes, you’ll get consistent prefix when reading other writes.