The black box engineer

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

The black box engineer

The future belongs to white box engineers

Jos Visser

May 21, 2025

Transcript

(Like this article? Read more Wednesday Wisdom! No time to read? No worries! This article is also available as a podcast).

Decades ago, when I didn’t understand a lot about Unix yet, I ran into some weird problem on an HP-UX system where the “who” command gave incorrect information. I had no clue what was going on, until I ran into a colleague who said: “Oh, no problem, I know how to fix that. He logged in as root, opened the “/etc/utmp” file, surveyed the contents, and changed a 1 into 0. When he saved the file, “who” gave the expected output again. “Whoa there”, I cried out: “What happened there? How does that work?” “I don’t know”, the colleague who came to the rescue said. “I just know that whenever I have this particular problem, this is the fix.”

He was a “black box engineer”.

The term “black box engineer” comes from the wonderful world of monitoring, where we distinguish between “white box monitoring” and “black box monitoring”. In white box monitoring, we understand the internals of the system under scrutiny and we gather metrics from inside that system in order to figure out what is going on, how it is doing, and if it perhaps needs a cookie. In black box monitoring on the other hand, we look at the system from an external perspective and monitor only the endpoints and behaviors that are visible to clients who live on the outside of the system. If you know your system contains a cache somewhere deep inside and you keep tabs on the cache hit rate, you are engaging in white box monitoring. If on the other hand you call the external API and check the latency and the return code, you are practicing the undervalued art of black box monitoring.

Any good system of observability employs both of these methods.

I use the term “black box engineer” to indicate an engineer who knows a lot about a system, but only from the outside perspective. They can use the system, they know all of the knobs and gauges, but they don’t really know what is going on under the hood. Black box engineers often function by knowing recipes: They can type in the right commands and push the right buttons, but without true understanding of what is going on. They have no real idea why a recipe works. This also means that when the list of recipes is exhausted, they are stuck.

If this were a LinkedIn post, I would now say: “What do you think? Have you ever met a black box engineer? Leave your thoughts in the comments.
👇👇👇

There are many problems with operating as a black box engineer, the obvious one being that without true understanding, you might fail to pick up on signals that indicate that the recipe might not work in this particular circumstance or, worse, is actively harmful. The flip side there is that you might also not recognize the applicability of a certain recipe if the signals are not a perfect match. An additional problem is that a human can know only a limited number of recipes and at one point, your SD card will be full.

When I start working with a new system, I invariably start out as a black box engineer, learning recipes from colleagues or documentation and applying them. However, that will get me only so far. Eventually, I run out of recipes and I have to understand what is actually going on in order to move beyond the recipes into an area of understanding. Whenever I start a new job, I keep an open file on my desktop where I record the recipes that I used and that worked. But, over time, that document gets less and less action as I synthesize from the recipes to a more in-depth understanding of how the system hangs together. And with that understanding, the recipes are no longer needed because understanding by itself generates the right set of commands or actions for almost every situation.

The problem here of course is time: You need time to build up that understanding. Unfortunately, time is often scarce. Therefore, it is totally okay to be in black box mode when you are new or racing to reach some milestone, but eventually you need to create a spacetime pocket where you can take a breather and read and think. As a former colleague of mine recently wrote on LinkedIn: “Every day is a school day, you are either the teacher, the student, or the furniture.” It’s okay to be the furniture for a time, but you have to transition to student mode eventually in order to really become effective at your job.

When in furniture mode, you will eventually get into a space where your recipes are not good enough to solve the problem at hand. Here is a recent example from my own experience: For reasons best left undiscussed, I am building virtual machine images for a set of services running on Azure. When I took up this endeavor, I copied some Packer recipes and got going. Eventually, I had to turn up my services in a region where there were only certain (newer) machine types available and my images didn’t work on these machine types. I asked ChatGPT, I looked at StackOverflow, and even read some documentation (that’s how desperate things got). But, none of the things I tried worked; the images I built just kept not working on these newer machine types. At this point it became obvious that my desperate tweaking of settings would not solve the problem anytime soon and that I would have to understand how all of it hang together.

Sometimes the “pushing random buttons” approach works and I can put off understanding what I did and why it works until some later time. But, you also have to recognize when the time has come to admit defeat and realize that you actually have to spend some time gaining the deep understanding that will help you definitively solve the problem.

I often use the weekend for that thinking, since that is typically the time that I get my head out of the bucket. I am not sure what your work week is like, but for me it often feels like someone ties a rope to my feet on Monday morning and then forcefully pulls me through the week. Before I know what is going on, it is Friday afternoon and I think: “What the fuck happened? Where did all that time go?” The days where you are desperately trying to get stuff working because you are in the critical path of some milestone are not the best times for the careful thinking, reading, and experimentation that leads to really deep understanding. It is hard to see the bucket when your head is in it.

The one thing I never understand are the people who are content staying black box engineers. Sure, it is impossible to know everything about everything, but I regularly run into people who are happy to know only the recipes they know and deploy those at the right moments. Apart from the obvious lack of self-improvement that results from that attitude, there is the obvious problem that pretty soon some AI near you will be able to perform that trick better than you, because the AI knows more recipes and can probably apply them faster and more accurately than you can.

The future belongs to white box engineers.

The black box engineer

Discussion about this video