0:00
/

Coding with AI tools is still hard!

Even with AI coding tools, you still need senior engineers around for the hard bits.

Did you know that Wednesday Wisdom is also a podcast! Find it on Apple Podcasts or on Spotify. Did you know that there is also a custom GPT called Midweek Muse that has access to all of Wednesday Wisdom? Did you know that all Wednesday Wisdom videos are also available on YouTube?

In other news: Today is the young Ms. Wednesday Wisdom’s birthday! Happy birthday Merel! Then finally, the title of this week’s article was graciously donated by my friend Sonya P. Check out her Youtube channel!

Last week, I rambled a bit about the history of software engineering and concluded that, since the dawn of time, there has been a desperate search for something, anything, that would allow normies to develop software, all in the hope that this would solve the crisis of software development. So far, this effort has failed. What has happened instead is that other improvements in the field have made software engineers more productive.

However, despite software engineers being more productive than ever before and there being more software engineers than ever before, software development is still expensive. Why?

Simply put: Because we also need more software than ever before and because, despite all improvements, software development is still hard!

The field of information technology is a disc growing outward from an origin that is very small and where the big bang happened 60-odd years ago. My first job, when the disc was still small, was at a bank, where we used three mainframes for core account processing, most of which ran as nightly batch jobs.

Q: Why nightly?
A: Because people would create transactions during the day 🙂. Then, in the evening, these transactions were “transmitted” to our data center through Dieselnet: A fleet of cars driving in the late afternoon to all the bank’s branches and collecting transactions on punch cards, floppy disks, and what have you. At the datacenter, we would read these transactions into the mainframe and start the batch jobs. At the end of the batch runs, we would print statements, which Dieselnet would then distribute throughout the country. Customers would bind these printed statements into little booklets as evidence of account movements and balances. If this sounds cumbersome, welcome to the world before the Internet 🙂

Over time, and through many expansions of the disc, we now live in a world where every consumer has a portable computer in their pocket that is more powerful than the mainframes we used in my first job. These computers are also continuously connected to all other computers on the planet, so no more need for Dieselnet and printed statements, because banking software on your pocket computer allows you to enter transactions, look at your accounts, and check your balances whenever and wherever you want.

In fact, as far as I know, if you do want a printed statement these days, you are either straight out of luck or you need to pay dearly for the privilege.

The expanding disc of information technology represents more computers coming into the hands of more people, as well as new ways for these computers to communicate and present information. All of this requires software and because the surface area of the disc grows quadratically, so does the required amount of new software that has to be created and maintained. Over my career, the disc expanded to include minicomputers, personal computers, the Internet, and finally mobile devices, wearables, and the Internet of Things.

To write and maintain this new software, we always needed more software engineers who, despite all the improvements, were not able to keep up with the growing demand. Hence, the software crisis perpetuated…

The other reason that software development is still expensive is that, despite the improvements of the past decades, software development is still hard.

A few years ago, I wrote an article called “Why programming is hard”, which sought to explain this fact by outlining both the cognitive and mechanical aspects of software development. The cognitive aspects deal with the fact that it is almost impossible to know upfront what to build and how best to build it. The mechanical aspects describe the inherent complexity of programming languages and libraries. Together, these explain why building a piece of working code is difficult and so it requires deep expertise. And in general, deep expertise is expensive.

By the way: One thing that I am leaving out of this story is that in parallel, the bar for what is considered usable software has gone up too.

Enter AI coding tools.

As we all probably know by now, coding with AI tools is amazing. Speaking for myself, in the year 2026 so far, I have not written five lines of code by hand.

Which, given the fact that it is almost June already, is perhaps kinda worrisome. What if I forget how to do it? But, that is probably a matter best left for another Wednesday Wisdom article.

I think that courtesy of my employment at an AI frontier lab, I am at the forefront of developments, but most of my peers in tech companies tell me that there is a huge pressure on them to use AI coding tools. Apparently, some companies are even deploying systems of metrics to figure out if a software engineer in their employ is secretly typing in code by hand, thereby stealing from the shareholders (or something like that, I really don’t quite understand what the thinking is here).

Everyone and their dog are expecting miracles from AI coding tools and they are right to do so, but I think they might be focusing on the wrong miracles, because AI coding tools will increase software development’s velocity, but it will not make software development as a whole any easier. Even with AI coding tools, coding is still hard!

To understand this, we first need to realize that software development is not a manufacturing process. The essence of a manufacturing process is that there are lots of brains needed in the beginning, but then the production of a complicated piece of equipment is simplified so that it can be executed by a combination of robots and people who are less knowledgeable and skilled than the original designers. Software development is not like that.

Each piece of software is a unique bespoke one-of-a-kind thing that is custom built for the exact problem it is solving. Of course we have libraries and reusable pieces of infrastructure, and fortunately we do because otherwise we’d stand no chance at all, but the actual application that is built on top of all that is a one-off. This means that for every piece of software we need to start more or less from scratch when it comes to the design and implementation. It is as if airlines would have every plane in their fleet custom designed and built for the passengers, route, and airports of intended use, including the engines and the avionics. It probably would make flying much more comfortable, but the costs would be astronomical and it would definitely be less safe.

The big problem in software development is knowing exactly what needs to be built (requirements) and how it needs to function (architecture/design). Once you have a crystal clear picture of all these details in your head, typing in the statements is (relatively speaking) easy. For various reasons, typing in the statements is very time consuming because lots of statements need to be typed in and you make many mistakes while doing it, but overall it is not difficult.

However, because of its cryptic nature and because (as we will see later) when we are typing the statements we are still designing the software in our head, non-engineers usually think that the coding part of the project is the problematic bit. Software engineers know better. If the coding part is actually hard and cumbersome, that typically indicates that you don’t quite know yet what problem you are solving or how to solve it.

Typing in the statements is exactly where AI coding tools shine. When you give an AI coding tool an instruction, it is never in doubt about what it needs to produce and it typically figures out very quickly which statements it needs. It then types in these statements with blazing speed. In other words, AI coding tools function best at the bit that was really never the problem: Writing the code once you know what needs to be done!

This is not at all an attempt to trivialize the genius of AI coding tools. I still marvel at the speed with which AI coding tools generate complicated pieces of code that stitch together algorithms, data structures, libraries, and APIs for complex services. But they are not intrinsically better at that than I am, just faster.

But here is something that AI coding tools cannot do yet: Figure out what needs to be built and which functional and non-functional requirements are the important ones to focus on.

Here is an example: For the last six months, I have been working on a solution for rolling out OS images to OpenAI’s fleet of GPU machines. This is a non-trival problem because of the size, diversity, and complexity of the fleet, as well as the fact that we would like to do that while continuing to serve ChatGPT and other products. Obviously, the cloud providers we use have different primitives for dealing with virtual machines, as well as different data models for grouping virtual machines and templatizing configurations, which makes it hard-ish to create a generic solution that works well across all our datacenter partners.

On top of that, NVIDIA’s GPU families have peculiarities too. For instance all machines in GB200 and GB300 (a.k.a. Blackwell) domains have to run the same release of the NVIDIA drivers, making upgrades harder. On top of that, there are velocity requirements for the rollout because our vulnerability management program prescribes that known vulnerabilities of a particular severity need to be addressed within a certain number of days.

With a problem of this shape, it is almost impossible (and in most cases undesirable) to figure out all of the requirements up front, make a detailed design, and then (set Codex to work to) type in the statements. So instead, as is the standard today, we are developing this software iteratively. Our first solution ignored the Blackwell requirements and only targeted machines in Azure. We had Codex develop this version based on what we knew, ran whatever it came up with, and then learnt about various peculiarities of reality. So we iterated: We ran the latest version Codex gave us, observed what was going on, learnt from it, and then put Codex to work with enhanced instructions, telling it what we really wanted and how we really wanted things to work with whatever we knew and understood now. Once Codex was ready, we repeated this cycle.

In this project, Codex was hugely helpful during all phases of software development. During the design, it helped us by telling us how Azure scale sets actually work, during the coding phases it typed in all the statements, and during the debugging phases it analyzed logs and reasoned about what was probably happening, fixing bugs and adapting to our changing requirements. Software development with AI coding tools comes down to being in a continuous conversation with the tool as you hone in on what really needs to be built and experiment with various solution avenues.

Q: Okay, I get it, but why is this still hard?
A: Because AI tools can write the code, but they do not own the consequences.

AI can type in the statements, but you have to decide what is important and guide the tool appropriately. If I tell Codex: “He, write me a tool that reimages all the nodes in this GPU cluster with a new operating system”, there is a high probability that it writes me a tool that does so for all machines at the same time, which is clearly not what I want. Codex does not know about our requirements regarding canarying, incremental rollouts, failure detection and whatnot, so I am the one who needs to tell it how things need to be implemented, after which I am the one who needs to check that of all 57 varieties to do that, it picked one that is appropriate for our environment. That requires knowledge, experience, and judgment. None of this is embodied in the model or, to the extent that it is, it is not necessarily activated without appropriate instructions.

Most real systems have context that is not fully visible, like undocumented invariants, weird legacy assumptions, deployment constraints, half-migrated APIs, old bugs people now rely on, and performance requirements. Most of these are not written down and thus beyond the reach of the model. AI coding tools are very good at code generation within a local context, but much weaker at understanding the full system. These bits of overall context are often ambiguous, badly spelled out, not necessarily widely agreed upon, or maybe only half-realized. This is the hard bit of software engineering and this is what the AI coding tools cannot do yet, so this is where you come in.

On top of that, AI tools are bad at “taste”. They don’t necessarily know when to abstract, when not to abstract, when to keep something boring, when a solution is too clever by half, or when a comment is needed to any human reviewer (assuming for a moment that human code review is still necessary). AI provides a solution, but human engineers still need to decide whether it is a good solution, given everything they know about the entire context.

Of course, models and tools are getting better all the time and so it is an open question whether the AI coding tools of tomorrow will be much better at this and then really make the entire software development process easier. But I don’t think so and here’s why: The things that make software development hard are exactly the differentiators between junior and senior engineers.

For the last two decades or so I have been saying that what separates me from a junior engineer is not that I can code better. Quite the opposite often: An L3 or L4 straight out of college is probably better at algorithms than I am, seeing as it has been quite a few years already that I took an algorithms class. When I interviewed at Google, I was asked about the runtime of quicksort which, at that time, I hadn’t thought about for 20 years or so, even though I had sorted many things using the handy library functions that come with most programming languages. What separates a senior engineer from a junior engineer are exactly the things I have been talking about in the latter half of this article: Contextual knowledge, understanding the requirements at a deeper level, taste, and experience. These are the hard bits and these are the bits that the AI doesn’t do.

All of this made me wonder what this is going to do for the employment opportunities of junior engineers. If it made you wonder too, you are in luck, because that will be the topic of next week’s Wednesday Wisdom…. 🙂

You know what is not hard? Subscribing to Wednesday Wisdom…

Discussion about this video

User's avatar

Ready for more?