Original post

It is. What you said was so true; it really depends on the use case. Users have to through this gymnastic themselves to pick the right thing… And it has a lot to do, again, with the CAP; you can’t have three of them, you have to pick two. What Spanner does, interestingly, is they claim that they actually are beating the CAP theorem… Which was something controversial, because what Eric Brewer says in theory – it’s a model to think about very extremes. So Spanner says that we have them all. You don’t have to make any compromises.

But in Eric Brewer’s mental model about his theorem – if you think about the very extreme cases, like 100% availability, 100% consistency, 100% partitioning – that type of extremes can’t exist because of the physical limitations of the world… You will have some sort of network partitioning of some sort… And Spanner is actually a typical CP system; it has 100% consistency, and it’s very tolerant to partitioning, but its availability is significantly higher than any other relational database. It provides five nines of availability, which means like 5 minutes downtime a year. That’s amazing. Most other relational databases require 10 minutes or whatever a month for maintenance and so on… Or if you wanna upgrade the schema, it requires downtime. Or the failover requires downtime. So how did this happen…?

The Spanner team says they’re beating the CAP theorem, because they provide this high availability… And it has a lot to do with the way the internals of this distributed system are working, plus our good networking infrastructure. We’re just kind of like improving the availability – not to 100%; we’re still talking about five nines, but five nines is actually a lot in practice.

[00:28:09.27] So our goal is maybe you shouldn’t make as many compromises. We will try to provide you a higher availability, but you will still have the transactional relational database. But at the same time we have a lot of limitations around the type of the schema limitations, for example, some SQL limitations… Because it’s hard to deliver really complicated queries on a very highly distributed system.

Latency-wise, for example, the way we handle writes are completely different from traditional databases… But we are trying to pick the best. For example, unlike other traditional databases, when a write comes to Spanner, we go and write it to multiple replicas. It arrives at the leader, but we synchronously sync it with other replicas.

But we use Paxos [unintelligible 00:29:19.24] so if a replica goes down, it doesn’t really stop the write. And in traditional databases they don’t have this concept, so it’s just kind of like if something goes down – that write fails, or there’s going to be huge latency until something comes back up again, and so on… So they are trying to pick up those different flavors of things… Because you know, the world has changed a lot; we have better networking now, we have better computers, we have specialized hardware, and so on… Everything is going distributed; we need larger-scale, more resilience… Why not think about a completely new database in this new world, with the new rules…?

That’s why I like the project, because it looks things from a different perspective, and then you are internalizing all the other hard problems in other databases by looking from that perspective.