0:00
/
0:00

The gift of modesty

Everything is more complicated and more difficult than you might think it is...

(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article is also available as a podcast). You can also ask your questions to our specially trained GPT!)

This article is one in a series of “gifts” that you can give yourself. There is for instance an article on the “gift of leaving” and also on the the “gift of time”.

I regularly read stories in the newspaper about some IT disaster show that is unfolding somewhere. When that happens my first reaction is: “Oh, if only I were in charge of that mess over there, then that would be sorted out in no time. How hard can that be?”

This is of course a rather dumb reaction because I have been involved in a few disaster shows of varying sizes and if there is one thing that I have learned from that, it is that it is very hard to turn the ship around. In the few cases where I have been able to turn chaos into order, honesty commands me to say that: “Yes, I did a good job, but I also got lots of help from others up and down the organization that wanted my mission to succeed”. It is a rare, and maybe nonexistent, leader (at any level) who can succeed against the combined opposition from inertia, entropy, and the resistance of people in the organization. Furthermore, you have to realize that when you read some news about an IT eff-up, you probably do not know the half of it.

Here is an example: Dutch ship builder Damen Naval is reportedly having trouble building six frigates for the German navy because of “problems with IT interfaces in its design and manufacturing systems”. Really, how hard can that be? Interfaces? I know everything there is to know about gRPC and REST, including hairy topics such as mTLS, authentication, asynchronous RPCs, long-lived RPCs, you name it. If I were in charge of that mess over there, it would be fixed in no time.

In reality, nobody, outside of a handful of people at Damen and maybe a few of their subcontractors, know what is really going on. The phrase “problems with IT interfaces” has come through many management and PR layers and was carefully designed to say exactly nothing. The thought that I could potentially fix this is stupid and uninformed; I don’t know what is going on there, but it is probably not the case that they cannot figure out how to configure the protobuf compiler or set the deadline on outgoing RPCs. That simple phrase: “problems with IT interfaces” probably hides a world of pain involving hundreds of thousands of lines of badly written and buggy software that might take years to fix with a crack team of software engineers. I could maybe lead that team, but the thought that I could, on my own, fix that problem in a short time frame is plain dumb.

Here is another example (and one that I have used as an example before): Project KEI, which sought to automate data processing in the Dutch court system to the point that all litigation could proceed digitally.

KEI stands for “Kwaliteit en Innovatie”: “Quality and Innovation”, but also means “rock” in Dutch. When you are good at something, you are “rock good” (“kei goed”). Did you see what they did there?

The project was originally budgeted to cost about 50 million euros, but after many delays, budget increases, the plug was pulled when over 200 million euros was spent, nothing much had been delivered, and what was being worked on was already looking outdated. On top of that, big law firms had already spent millions in order to make their systems compatible with KEI. What makes these grapes extra sour is that at the same time, the government could not find about 127 million euros in their budget to finance legal aid for the indigent.

When I read news like that my hands are itching to get involved because, come on, automating the document flows of the court system? How hard can that be? The answer is of course: Very hard! KEI attempted simultaneous digitization of all major branches of civil and administrative law, every court in the Netherlands, and both procedural and internal workflow systems. On top of that: The legal reforms required to make this possible were developed in parallel with software design, so the legal framework and requirements kept shifting while systems were being built. Moreover, the judiciary’s internal workflows were not sufficiently standardized; each court had its own traditions and processes, often dating back over a hundred years and thus firmly entrenched. The system, intended to be uniform, became a patchwork of exceptions and customizations. As the attorney’s weekly magazine (Advocatenblad) so aptly put it: “The technology could not follow the pace of legal changes and the law could not adapt to technological limitations”. Project KEI failed because it tried to digitize the entire legal system faster than it could understand itself.

Thank you ChatGPT for helping me summarize the various issues with this project. Here is the full analysis by ChatGPT.

Everything is always more complicated than you think it is. Understanding that fact should breed modesty.

In the last two weeks we have seen a lot of lack of understanding around the AWS outage in us-east-1 on October 19 and 20. As cloud service outages go, this was a whopper and in the first few hours, people fell over each other, especially on LinkedIn, never a platform for considered informed opinion, to explain what was going on and why this happened. Many were quick to point out the “brain drain” theory, positing that this happened because of a lack of senior staff, caused by recent rounds of layoffs and because Return to the Office (RTO) apparently prompted very senior engineers to resign. Other posters purported to know exactly how to design complicated cloud infrastructures and applications, lambasting Amazon for having designed and implemented their cloud incorrectly and chiding their customers for not building cross-region or cross-cloud solutions. Both of these criticism are dumb and underinformed.

Remember: Reliability is a risk affordance tradeoff!

Sidenote: Meredith Whitaker, the CEO of Signal, had a more considered opinion, outlining in a series of tweets (excuse me: messages on Mastodon) that Signal, which relies partly on AWS, really has no choice because “running a low-latency platform for instant communications, capable of carrying millions of concurrent audio/video calls, requires a pre-built, planet-spanning network of compute, storage and edge presence that requires constant maintenance, significant electricity and persistent attention and monitoring.”

Everything is more complicated than you might think it is. I understand the pressure to be among the first to say something smart on LinkedIn, but it is really not a sign of great intelligence to opine on what is going when few details are available. Additionally I would say that it is not smart to criticize Amazon for doing something wrong, unless you have in-depth knowledge of systems at this scale and have experience running something equally big and complicated. Not many people have.

It is interesting that I have not read any comments from distinguished engineers at one of the other cloud providers. They know all too well how incredibly complicated these systems are and how much emergent behavior is displayed by them. One of the smarter things I have read about the outage is this one.

In the face of complexity it behooves us to be modest. My most common answer these days when asked difficult questions about current events is: “I don’t know, I need to think about that a bit”. I have also grown much slower to criticize failures because in my experience, some human projects are just beyond success. Modesty is not a hip trait in a world that suffers from culturally sanctioned ADHD, but all things considered, I’d rather be undecided while I am thinking, than wrong.

And remember this: Most of the time, your opinion does not matter at all.

Here is something that you don’t have to think about at all: Subscribe to Wednesday Wisdom!

Discussion about this video

User's avatar

Ready for more?