The unloved relational database

Or how an upstart storage technology is wreaking havoc on consistency and correctness

Jul 12, 2023

Here is something that I have been wondering and worrying about for some time: The popularity of the distributed key/value (K/V) store at the cost of relational databases.

Compulsory video here.

At the start of my career, relational databases were slow, expensive, and often just not available. Consequently, almost nobody used them. Instead, we used a host of other data storage technologies to store stuff. The most advanced technology we had was the index-sequential file, basically a sequential file with a secondary index on it, so records could be read both sequentially and in index order.

And the crowd went wild...

The advantage of this mishmash of technology was that for every problem you chose the technology that matched your use case. This typically meant you optimized for cost as well a performance, leading to compromises on features to keep that cost low and performance high.

This was the time that a disk unit containing four 1.8GB hard disks cost $97,650 to $142,200.

The disadvantage was that your design choices for on-disk data structures locked you in with respect to future feature requests.

Q: Can we also do a report like this?
A: Nope, because the data format does not allow it!…

This was one of the prime drivers of the waterfall method: Before locking down the data structures on disk you better knew all the things that you were going to have to support. Once the bits hit the magnetic platter, the format you put the bits in was set in stone.

The relational database completely changed that game. Suddenly your physical data model was almost identical to your logical data model and you used a logical query language to get access to your data. Finding an optimal access path to retrieve the data was the problem of the database engine and with the help of indexes and hints, the engine could usually do a great job.

The amazing thing about relational database systems is that they allow you to model your data according to the structure of your problem domain and your queries according to the needs of your features. These two were now decoupled! New feature? New query on the same data, and you’re done. Maybe (probably) add an index, maybe (rarely) provide a query hint, but that was it.

Relational databases were usually slower and less efficient than hand-optimized code operating on hand-optimized on-disk data structures, but as computers became faster that stopped being an issue.

Another great advantage of the relational database was their support for atomic transactions. We used to worry all the time about consistency and correctness when doing updates that hit multiple entities, but with ACID transaction support in relational databases that worry went (mostly) away. Even a nitwit could now update two tables and sleep soundly in the knowledge that either the update happened to both tables or it didn’t happen at all.

Again, the crowd went wild!

The biggest drawback of the relational database was that the size of the database and the number of transactions it could process were tied to the capacity of a single machine (cue comments on Oracle RAC and similar technologies). When the Internet came around, that started to become a problem. In the late 90s (or maybe the early 2000s), I attended a talk called "My database isn't performing and I already have my vendor's biggest machine", which neatly summed up the problem that we were running into.

Before the Internet became a thing, we would build systems that had at most all the employees of a large company as concurrent users. For sure these could be big numbers, but we were typically never exceeding 100,000 concurrent users. Systems that went beyond that, like airline reservation systems, were still built using bespoke transaction facilities and databases. A similar reasoning held for the amount of data we stored: The limited scope of our systems usually meant that the number of entities was well within the manageable range. I remember in the early 2000s I came across a relational database of 80GB and that was considered a lot! But in the Internet age we were suddenly building systems that had millions of concurrent users, accessing or storing billions of things. The relational database could not handle all of that.

Something new was needed.

So we went back in time and reinvented the index sequential file, but with a distributed twist, like hosting these files on distributed file systems or slapping on a record format that could deal with the things that happen in large distributed systems. This worked fine for read-only applications like Internet search, where the files can be created in a batch process. But eventually we needed a distributed storage system that could support high volumes of writes as well as safely store many oodles of data.

The distributed K/V store was born. Simple idea. Brilliant idea. With many great implementations. And also a good source for a decent (but overused) interview question.

As far as I know, Google’s Bigtable was the first distributed K/V store that operated in production at Internet scale. It was awesome. In 2008, I managed an Internet service with a production Bigtable that processed over one million reads per second with single digit millisecond latency. I love(d) it.

Apart from being able to operate at scale, distributed KV stores have another big advantage over relational databases: They are incredibly easy to use. Once you have the system up and running (which became very easy when cloud providers made that a turn-key proposition), a distributed K/V store is about as hard to use as a dictionary in Python.

K/V stores multiplied like bunnies.

Unfortunately, before long we were running into problems due to the technology’s simplistic API model. A K/V store’s native access pattern is very simple: Most only support single key access (GET) and scanning forward from a particular key (SCAN), which is not enough for more demanding applications. We gained speed, latency, and storage size, but we lost atomic transactions, joins, and other powerful features from the relational database world.

The aforementioned Internet service stored photos in a K/V store. We also supported photo albums and so we kept another (big) table of which photos were in which albums. The key for the album store was a tuple of <user_id, album_id, photo_index>. When we needed to display an overview of an album, we would scan forward from <user_id, album_id, 0> and get all the photo references. So far so good.

When a user clicked on a photo thumbnail in the album overview page, we would show them a page with that photo at its full size. To the left and right of the photo we showed “previous” and “next” buttons to allow users to scroll through their album. Given the key of the photo the user was looking at, the next photo was easy to find by just scanning forward in the album table. Unfortunately, the previous photo was impossible to find because our K/V store could not (and cannot) scan backwards!

What to do?

We solved this problem by creating a second (big) table which had as its key <user_id, album_id, -photo_index> (mind the minus sign). In that second table, a scan forwards is equal to a backwards scan in the main table. Whenever a user pressed the “previous” button, we would do a forward scan in the reverse table and show a page with whatever photo we hit upon. If the user then pressed the “next” button we started scanning from the main table again. We were pretty pleased with ourselves for figuring this out.

The consequence of this approach was that whenever we added a photo to an album, we had to insert two new rows: One in the main table and one in the reverse table. Unfortunately, our K/V store didn’t (and doesn’t) support multi-table (or multi-row) transactions. Every now and then one of the required inserts would fail. This led to a situation where a user traversing through an album would skip over a photo because the scan in one table would not return that photo whereas a scan in the other table would #annoying. Our solution? Every night we would run a batch job that consolidated the two tables 🙂.

This is the kind of problem that cannot occur when you use a relational database.

Over time we ran into more and more of these problems and many K/V stores added features to make some of these things easier to solve. Some K/V stores added support for indexes. Some added limited support for transactions. Overall, life was getting better. There was one thing though that K/V stores didn't do and afaik still don't do well (cue snarky comments), and that is joining two tables on a foreign key. This is very sad because joins are required in order to afford a sane (third normal form) data model.

The fast adoption of K/V stores had another consequence though: The use of that technology for use cases that didn't need it. For the last decade or so I have been coming across applications that use K/V stores though neither the use case’s data size nor their transaction volumes warrant that choice! That is double plus sad, because it means you made a choice that gives you neither the advantages of the K/V store (data size, transaction volumes) but all of its disadvantages (very limited data model, no transactions, no joins). Why do that? Why forgo a great data storage technology in favor of a more limited one if the use case does not warrant it?

One of them is fashion. K/V stores are hip and many software engineers are fashion victims. Having a production application using a K/V store on your resume looks good (even though it basically only means that you know how to use a dictionary in Python).

A colleague once told me that the technology choices in her team were made because their tech lead wanted to get some experience with a particular/popular technology.

One factor at play might be cost: Relational databases have a higher fixed cost because of the need for a private database instance. Because of architectural differences, relational databases do not lend themselves well for multi-tenant implementations. If you only need to store a few records, a small table in a shared K/V store might be the way to go if cost is a major concern (and you don’t need the relational database’s data model features).

This is not just something that we see with K/V stores. In the nineties I regularly ran into the use of a relational database for storing a very small amount of information that might as well have been stored in a text file. The tunnel vision on a particular type of technology existed then too.

What I am afraid of though is that the use of K/V stores is often subconsciously driven by the fact that a K/V store does not require you to think about your schema upfront. At least not really… There is literally only one thing that you need to care about before starting to use a distributed K/V store and that is your key format. If you choose the wrong key format, you are setting yourself up for a lot of headaches right down the line. Fortunately, getting the key format right is not hard.

In one Bigtable I was tangentially involved with we managed to choose a key format that led to a row with a “hot key”: A single key that got so much traffic that we had to isolate that single row in a machine of its own.

K/V stores do not force you to think about your data model or schema before you start coding. Just store a bag of fields as a JSON map into the value part of a row and you're good. That might seem like an advantage, but it is not. Compare it to not brushing your teeth every evening. Yes, you save a few minutes every day, but how well is that going to go for you in the long run? Many K/V stores are completely agnostic when it comes to whatever you want to store. Different rows can store different things. New fields can often be added in-flight. The K/V store doesn’t care. You ask it to store some blob, it stores it, and then when you read it again it is your problem to make sense of it. This provides great flexibility, but unfortunately we have proven ourselves to be very bad at dealing with that kind flexibility.

Relational databases on the other hand require you to think really, really hard about your data model. You need to explicitly structure your data and tell the database engine which columns you have and what sort of data is going in there. If a column in one field is actually a key for another table (a foreign key), you are advised to inform the database engine so that it can check upon insert or update whether the relationship is still intact. All of this is a lot of hard work and people would rather not do hard work. Unfortunately, it is rather necessary hard work, because either you do that work up front and then the database does the needful for you, or you will have to do the needful yourself in your code whenever you read something from the K/V store.

You cannot escape from the need to do that thinking.

The only choice you get is whether you do it upfront or later on. Given our propensity for procrastination, it is not surprising that most people would rather leave these “details” to other people to figure out. K/V stores allow you to look hip and not worry about your data schema during the design phase. A match made in heaven for the fashion conscious modern software engineer.

Unfortunately, as I said, you can in the long run not get away from the requirements of checking your data and establishing relations between various pieces of data in your K/V store. So what I see more and more is people basically reimplementing a relational database system on top of a K/V store. For instance, in the absence of support for query-time joins, a common pattern is to run batch jobs that do the hard work of joining data and then writing a denormalized table that can be queried (but not written!) using the K/V store’s APIs. Not only do you now have to do the joins yourself, but that denormalized view is always hopelessly out of date! This might be a worthwhile compromise if your use case really needs a K/V store, but is just sad if it really doesn't.

Then again, this creates an opportunity for someone to write a generic K/V store join engine on which they can get promoted, so all is not lost :-)

One thing that makes me very sad is the recent pattern in the relational database world to store blobs of JSON in some column. Many modern relational database systems have good support for JSON and they typically allow fields in JSON columns to be referenced in indexes, queries, and foreign keys. Like all technology there are very good use cases for these kinds of features, but relational database JSON support is commonly just abused not to have to do the hard work of schema design and maintenance, which means that your database engine cannot be as helpful as it could be.

Niklaus Wirth wrote a seminal text book called Algorithms + Data Structures = Programs. In the translation of that equation to the distributed world, schemas replace data structures, but we seem to have forgotten that. I am not at all against the use of K/V stores; it’s awesome technology for use cases that need it. But I am afraid that their low barrier to entry, especially when it comes to schema design, allows people to get away with not doing necessary hard work upfront, shooting themselves and their colleagues in the foot.

Grzegorz Wierzowiecki

Aug 21, 2023

Here's a 7 min audio version of "The unloved relational database" from Wednesday Wisdom converted using recast app.

https://app.letsrecast.ai/r/7ee56b80-b49b-4530-b5be-a34460f026ad

Expand full comment

Dominique Quatravaux

Jul 27, 2023

There is an important piece of the topic that I think needs to be discussed as well, which is called the object/relational impedance mismatch and which tips the balance towards K/V, fashion issues notwithstanding. In applications slightly more complex than a photo album, storing lists (in particular, user-ordered lists) of things in a relational database can be a huge pain in the backside, to the extent that that consideration alone may end up dominating the matter of the programming language your project is going to use (i.e., Ruby). Whereas since only key format matters in a K/V store, you can use an off-the-shelf serialization library to encode/decode your values, neatly side-stepping the entire issue. There are (in hindsight, rather obvious) techniques to get the best of both worlds, but I find them to be seldom used in the real world.

The unloved relational database

Or how an upstart storage technology is wreaking havoc on consistency and correctness

Discussion about this post