0:00
/
0:00
Transcript

The one about coding with AI tools

Being able to write clearly will be more important than ever!

This somewhat herniated Wednesday Wisdom follows a bit of a different format than the others. If you want, you can jump straight to the “This week’s Wisdom” section at the end…

An anecdote…

Yesterday (relative to when I started writing this week’s article), I went flying and decided that it was a great time to spend a few Hobbs hours figuring out how the autopilot works. I like to fly old-fashioned “six pack” Cessna 172s which lack advanced avionics. The autopilot is usually a fairly basic Bendix King KAP 140 that manages to pack a dozen functions in a knob, a handful of buttons and a tiny display with very cryptic indicators of what is going on.

Flying, when staying on a prescribed heading at a constant altitude, is a control loop where you look at the instruments, make a tiny correction, and then rinse and repeat. It is no wonder that a very simple computer and a few servos can do this better than I can, because after ten minutes or so of flying straight and level by hand, my mind starts to wander and I inevitably drift off course or lose or gain altitude. Therefore, almost all pilots I know who fly longer cross-country flights, use the autopilot.

Until yesterday, I had not spent much time with the autopilot and so I spent a lovely morning with a few clouds and quite a bit of turbulence flying up and down the California coast between Half Moon Bay and Santa Cruz, while toying with the autopilot and configuring it to change headings, climb, and descend. It went quite well and even allowed me time to read the autopilot manual to figure out some of the finer points of autopilot use, also making this the first time that I actually read the pilot’s manual inside the plane.

What does this have to do with anything?

In a similar vein, late last year, I decided to go all in on AI coding tools. As you might know, I work for OpenAI and it will probably not come as a surprise that we are AI maximalists. The results, so far, have been quite spectacular and in the year so far, I have not written five lines of code, despite committing many thousands of lines, including entire features and complete rewrites of pieces of code; all done by Codex with minimal oversight.

On LinkedIn, the debate about software engineering with the help of AI coding tools is raging hot, with the extremes, per usual, taking up a lot of bandwidth. Also per usual, the extremes are idiotic.

Some historical perspective…

When I started studying computer science, all the way back in 1984, many questioned the wisdom of that choice because a lot of people expected that with (then) recent breakthroughs in artificial intelligence and so-called fourth generation programming languages, either computers would write their own software really soon now or people without a computer science degree would be able to write software using a combination of AI and 4GL. Suffice it to say that these predictions did not come true and for decades I earned a very comfortable living telling the idiot box what to do.

Over these decades, I have seen many attempts at creating a programming environment in which non-engineers could develop working computer programs. None of them ever gained significant traction because, quite simply, they were too limited and still required, if not actual hands-on coding skills, at least the knowledge and mindset of a software engineer.

So how did that go now then?

Since late last year, all my programming has been done using our coding tool, Codex, to which I obviously have privileged access in terms of which AI models I can use (the latest and greatest) and how many tokens I can burn (practically unlimited). Here are few of the projects I have worked on, both at work and at home:

  • [At work] I rewrote a Kubernetes controller for automatically making namespaces for new colleagues from Python to Go.

  • [At work] I extended a tool that rolls out operating system images to nodes in Azure Kubernetes clusters to also support clusters in GCP.

  • [At home] I improved my proprietary backup tool with new subcommands to restore files (strangely enough, it didn’t have that functionality yet 🙂), remove files from the archive, and verify archive integrity.

  • [At home] I started a project to change a programming language interpreter I have written from using recursive execution of the AST to generating a language-specific bytecode and then executing that bytecode.

  • [At home] I created a single page web application (ChordCraft, name chosen by Codex) that shows how to make piano chords. A fun fact here is that I copied and slightly modified a prompt written by a colleague that he had used to create a similar application for guitar chords, foreshadowing a whole new era of copying and pasting known-good prompts.

  • [At home] I solved a few AdventOfCode 2025 programming puzzles in ARM assembler.

None of these are amazing never-before-seen pieces of work, but they are real work and not entirely trivial either. So how well did Codex do?

First of all: None of these projects required anything more than spending time with Codex; I did not have to write any lines of code myself. This prompted me to write a post on LinkedIn that said: “I have seen the future of my profession and it is talking to Codex all day to tell it what to do and pressing “1” quite often” (“1” being the option to allow execution in the menu that Codex presents when it wants to execute a command that breaks out of Codex’s sandbox and thus might potentially be dangerous).

What I learned from this experience…

First of all, and this should not come as a surprise, Codex is not perfect; it is not me in robot form 🙂. It made mistakes, misunderstood me, and on one occasion filled up “/tmp” while running tests that wanted user input but then got itself into an infinite loop when there was no user input to be had (because stdin was at EOF).

But then again, once I figured that out, I told Codex this: “In [function] sys_yn, if there is no terminal attached to the process or stdin is EOF, we get into an endless loop. This should be fixed with a new default argument to sys_yn that determines what the default value to return is when there is no terminal input to be had.” Codex replied with: “I located all sys_yn implementations and call sites. Next I’m updating the function signature and each platform implementation so EOF/no-tty returns a caller-provided default instead of looping.”

However, when Codex does make mistakes, it is often quite good at fixing them itself, provided that a good suite of tests is available. For instance in my programming language interpreter (more details to follow), Codex made a change and then ran the tests, which crashed. It then went into analysis mode, and finally concluded:

“I isolated the end1 crash: END triggers a RESTART longjmp that bypasses cleanup in exec_seq2, leaving ERRBUF pointing at a dead stack frame; the next run_error jumps into stale state and crashes. I’m patching run_loaded to save/restore ERRBUF around each run so non-local exits cannot poison later runs, then I’ll re-run the gates.”

I really couldn’t have said it better myself…

Codex is an advanced tool and like all advanced tools, you need to figure out how to use it. Yesterday, I spent half an hour figuring out how to get the autopilot to ascend from my current altitude to 3,500 feet at 100 ft/min and then stop once it got there, and this autopilot only has seven buttons and a knob (excluding the knob pushing the buttons). Codex offers you an interface that you can talk to as if it is a human, but it is obviously not a human, so you have to learn how to talk to it to make it do whatever you want it to do. I assume that it will get much better really quickly, but if you talked to Codex and it didn’t do what you wanted it to, don’t give up, but learn how to talk to it!

One of things to learn is when to be precise and when you can afford to be loose in your instructions. Depending on the context that Codex has, you can sometimes get away with a one word instruction or just the full text of an error message. At other times, it pays off to write like a lawyer. In a previous Wednesday Wisdom article, I emphasized that learning how to write is a career amplifier and I expect that this will quickly become even more true than ever before.

You want details? Fine!

One of the work projects that I attacked with Codex was to extend a tool that rolls out operating system images to Kubernetes clusters. The tool supported only Kubernetes clusters on Azure, but as OpenAI is growing its compute platform, there is a need to extend this tool to support other cloud providers too, starting with GCP. Given the many differences between Azure and GCP, it is not entirely straightforward how to extend the existing functionality with GCP support, so I started by writing a good old fashioned design doc that outlined how the Azure implementation works and how I thought support for GCP could be implemented. Remember what I wrote above? Write like a lawyer!

With the benefit of hindsight, I should probably have started instead by having Codex write the first version of that design doc. But there you go, one of the things that all of us are still learning is what these tools are actually good for, which usually turns out to be a lot more than you’d think at first.

Once the first draft of my design doc was ready, I fed it to Codex and asked it to review my document. It came up with a few improvements and some ways to strengthen and disambiguate the design. Once that was done, I was wondering if I could feed this design straight back into Codex for implementation. However, I was a bit worried that if I did that, it would take the shortest path to the solution, which would probably entail writing a lot of “if GCP” statements. I am not sure if that fear was warranted, but regardless, I first asked Codex to refactor the existing code to extract appropriate interfaces and to hide all of the Azure-specific code into a separate package so that (hopefully) it would implement GCP support as “just” another implementation of that interface.

Before I continue, here’s a side story about another Codex project where I learned something…

Since 1992, I have been working on and off on a programming language interpreter for the COMAL programming language, which is mostly forgotten now, but it is the language I was taught in secondary school. This is a useless hobby project that is going nowhere really slowly, but every few months or so, when the fancy grabs me, I fire up the C compiler to fix a few bugs or add some features. For instance in 2024, I added WASM support and spent some time getting the DOS version running again, for that special time in the future when the 8086 and 640K RAM become the computing platform of choice again.

Also in 2024, I integrated ASAN into the build and since then most of the bugs I have been dealing with are leaks and other memory management errors. A few weeks ago I started Codex on my home laptop and fed it the OpenComal source code and the most recent ASAN error trace and in under a minute it had fixed a bug that I think would have taken me at least an hour to track down, since it involved the way that the ncurses library uses memory and I am not an ncurses wizard by any means.

Here is what Codex told me: “This leak report is from ncurses screen allocation (new_prescr via initscr), not your app heap logic directly. endwin() restores terminal state but does not always free the SCREEN object, so LSan reports it. I patched the Linux backend to explicitly own and free the ncurses screen:

  • Added SCREEN *screen in pdclinux.c (line 34)

  • Replaced initscr() with set_term(...) in pdclinux.c (line 165)

  • Added delscreen(screen) in teardown in pdclinux.c (line 195)

This should eliminate the 1632 byte(s) leak from libtinfo on exit.

Enthused by these results, I decided to take on a larger project that I had been thinking about for some time: Changing the way the interpreter works from recursive execution of the AST to the generation and then execution of a language-specific bytecode. This is of course a huge project, so I decided to cut it down in “byte sized” (ha ha ha) chunks. I entered into a dialogue with Codex where I explained it what I wanted to do and asked it to come up with a bytecode. I then asked it to implement a COMPILE command to compile the program in memory into the bytecode, a CLIST command to list the generated bytecode, and a CRUN command to run that bytecode.

The results were at the same time amazing and not. It did create a byte code of sorts and it did implement the COMPILE, CLIST, and CRUN commands that I asked for, but the byte code it came up with was very close to the AST and the COMPILE command effectively only produced a slightly different version of that AST. Not at all what I wanted, because I had a way more advanced implementation in mind where the bytecode would be completely independent of the AST, close to how CPython works.

However, I realized that the changes Codex had made could be improved upon over time and that one of the advantages of doing it “the Codex way” was that at the end of every prompt/response cycle I had something that still passed all of the integration tests (which by the way, per my request, had been improved significantly by Codex too).

After that first naive bytecode implementation, I wrote a huge prompt for Codex that instructed it to make a plan to unwrap expressions into a linear execution of appropriate (and new) byte codes, to implement each language statement as a separate byte code, to implement control structures (such as REPEAT, WHILE, IF, and FOR) using absolute and conditional jumps, and to write this plan to a markdown document. Again, some back and forth was involved, but quite quickly I had a plan of attack that I could send back in pieces to Codex.

To get Codex to work on that plan, I asked it what I should ask it to execute the next step in the plan. Codex came up with the following prompt for itself: “Continue the next unchecked item in /[...]/bytecode_full_implementation_plan.md, run the Validation Gate, and commit the code changes”. So, the next few evenings, while binging “Blue Lights” on BritBox, I repeatedly sent Codex that particular prompt and of course pressed “1” a lot. It was by far the most relaxed coding I have ever done.

When I explained that strategy to my colleagues during our regular “Here is how I AI” meeting, they told me about the Codex “/plan” command which instructs Codex to go into a mode where the prompts you give it roll into a plan on how to make all these changes. You can work with Codex on the plan until you are happy, and then ask it to go on and implement it, after which you just have to press “1” a lot to keep it going.

By the way, if you are sick and tired of pressing “1”, you can put Codex in “yolo” mode. Results may vary… 🙂

Coming back to my work project…

Knowing what I knew now, I put Codex in planning mode and fed it the design doc I had written. We went back and forth a bit on some details in the plan until I thought it was good to go and then I told Codex to go ahead and implement this plan. This was only last Friday, so I haven’t had a chance yet to see what it has wrought, but based on my experience so far I expect that it got things 90-95% correct and I will probably have to spend a day or maybe two going back and forth to fix various bugs and improve on certain features.

Finally, this week’s Wisdom…

When I read LinkedIn, two viewpoints on the use of AI tools for coding dominate.

Viewpoint #1: AI tools will completely replace humans. Based on my experience I believe that to be highly unlikely because the tools, at least in the state they are in today, require a highly qualified driver, who can then produce more code than ever before in a shorter timeframe. Codex today does not replace human software engineers, but it does allow human software engineers to be more productive than ever before. Whether this will translate to companies needing fewer engineers or if they will use the same amount of engineers to crank out more code remains to be seen. Probably a bit of both, depending on the organization. The tools will of course get much better, but I do not see a future where they will completely replace human software engineers as drivers of the overall plan and design. Typing in statements may be completely dead anytime soon though.

That said, I am terrible at predicting the future.

Viewpoint #2: AI tools are hopeless and only generate bad and buggy code that humans will then have to fix. That is quite simply not my experience. If the posts that espouse these viewpoints are honest, I just don’t understand what is going on there. Are they using the right tools? The most modern tools? Do they use these tools in the right way? Did they select the right model and reasoning setting? Did they make a good and honest attempt to use these tools effectively? I really don’t know. But, as I said before, these tools are not humans and you cannot use them in the same way as you would instruct a human. If that is your expectation, you will be disappointed. Everyone needs to learn how to use these tools and, in my experience, if you use them well, you get good outcomes.

AI today suffers from what we call the “jagged edge”, they are exceptionally good at some complicated tasks and exceptionally dumb at some simple tasks. Both the peaks and the troughs of that jagged edge give the punters enough material to pontificate about. But between those extremes are the people who are genuinely interested to learn how these tools work and how to put them to good use, and these people are getting results.

What many engineers seem to be misunderestimating is how unpopular software engineers are with many companies. Over the last few decades, during the Great Talent Shortage, companies had to pay through the nose for average talent and many software engineers behaved like total prima donnas. Any technology that promises companies that they can use fewer software engineers or maybe pay them less, will be popular.

One thing that struck me is how taxing working with Codex can be if you have multiple threads going on at the same time. On one occasion I was having five agents working on different projects and I spent my day context switching from one agent thread to another to judge what it had done, give it additional instructions, or (of course), press “1”. After a few hours of that my brain was fried. Fortunately, I recently found the pool table in some far away corner of our office… 🙂

My colleagues and I have been getting very good results using Codex but we are all smart and experienced software engineers who took the time and made the effort to learn how to use the tools well. But even then, many questions around how to use these tools optimally are still unanswered.

Here is one of these questions: Over the last few decades, we have developed a model for software development with design docs, sprints, test-driven development strategies (TDD), code reviews, and CI/CD. AI coding tools pose the question how that model should change to afford the new powers of these tools. Is it for instance still necessary to review code? Should we care if the tool generates code that humans can understand? Are duplicate code and duplicate constants still a problem? What is the place of TDD in the new workflow? Are big design docs still in vogue? Nobody knows…

This does not mean that people do not have opinions about it, but as far as I can see, most of these opinions are not informed by extensive AI-driven development experience, for the simple reason that these tools have not been available for a long time yet and there are still not a lot of people around with extensive experience with the most modern tools. We can all have thoughts and expectations, but beware of the unfortunate desire for certainty

Another question that is popping around in my head is how we are going to train the next generation of software engineers. My colleagues and I are getting good results but, speaking strictly for myself, I think that I am getting good results because I have an in-depth understanding of what the software should do and what the code should look like, informed by decades of coding in multiple languages on multiple platforms. This allows me to instruct the tool “correctly”, evaluate the results, and course correct when necessary. But if these are going to be essential skills going forward, how are we going to give the people who are now in college the practical experience to be able to use these tools effectively? I do not have any answers for this right now.

A year and a half ago, I wrote an article called “Coding in the time of AI”, which was a followup to an article that I wrote two years before that. The speed of development in the last few years has been astounding and if you want to read some really outdated views on AI usage you should review these older articles. Where a mere three years ago, the tools afforded more or less a very smart auto-complete, they are by now, when instructed correctly, able to write significant pieces of complicated code and to debug that code all by itself.

I have a terrible track record when it comes to predicting the future, but it seems a safe bet to assume that the future belongs to people who understand these advanced tools inside and out and who know how to use them. If at this moment in time you are not investing in proficiency with these tools, you are doing yourself an enormous disservice. I am not able to tell you exactly what our job is going to look like in the future, but these tools are here to stay. If there is any future for software engineers, a significant part of that future will probably be for people who can code like demons in vim, who can instruct AI tools very accurately, and who can press “1” a lot.

Don’t get left behind.

Also, do not get left behind with regards to Wednesday Wisdom. Subscribe today!

Discussion about this video

User's avatar

Ready for more?