Trivial code reuse
We often spend more time discussing potential reuse than it would take to just build it
(Like this article? Read more Wednesday Wisdom!)
Over the decades of my career (so far), I have spent too much time discussing, and sometimes, but not always, preventing, the reuse of trivial code. That is a sad state of affairs that I unfortunately do not seem to be able to get away from. No code appears to be so simple that someone would not suggest we reuse it. Even if building it ourselves would be cheaper and faster.
The ability to reuse existing code is both a blessing and a curse. It is also somewhat unique to our field. Compare it with civil engineering: Literally nobody who needs a bridge will hear: “There is a bridge in another state, why don’t you just use that one?” You might experience pressure to reuse plans and designs, but nobody will say: “We’ll make it work for you by creating another on-ramp to that other bridge and adding a few lanes to allow for your traffic. It’ll be fine…”
In software engineering on the other hand, I am in discussions like that all the time. The sad low point of that was probably when I spent weeks discussing whether we should reuse a component to generate a list of items on a web page: We had a list of things in a database and needed to show that list of things on a web page. Fortunately, I have understood both the for-loop and the print statement since approximately 1982, so solving that problem was well within the scope of my professional skills. However, there was another service in our company that had already written a for-loop and the question came up whether we should reuse their service for generating our web page. Cue a discussion that took oodles of time including meetings, research, chats, emails, proofs of concepts, and eventually an escalation to an engineer who is even more senior than I am. In the meantime, the engineering team that I was trying to shield from this madness was loudly wondering W the F I was doing…
To be honest, I am sometimes on the “please reuse side” of the discussion. For example, when I was a site reliability engineer at Google, I had more than one discussion with engineering teams that wanted to write their own cross-region distributed consistent cache, usually because Bigtable’s replication wasn’t always as amazing as they wanted it to be. I thought this was madness and explained that madness to them as follows: “First of all, you are a team of about eight software engineers and you have to build the application as well, whereas the Bigtable team has a team of more than eight engineers focused exclusively on making cross-region replication work. Given that setup, what are the chances that you are going to do better? Additionally, the Bigtable engineers are not muppets and their software runs on exactly the same infrastructure. What is the likelihood that you are going to do much better? The same things that impact Bigtable are going to impact you. Think about: Packet loss, hardware problems, cloud platform scheduling issues, and a massive intern map-reduce job killing the cluster.” I would then ask my discussion partners to explain to me in what way they were going to do a better job than the Bigtable team. With that reasoning, I usually managed to make them see the errors of their way.
Reuse discussions are tough because you need a really good grasp of all the details in order to make the right decision. Whether reuse is a good idea in the end comes down to whether, all things considered, it is cheaper in the long run to do so. The answer to that question depends on an in-depth insight into all the work required, now and in the future. Me writing a for-loop to build a list on the web page is probably cheaper than the organizational and technical work required to adapt someone else's for-loop into a reusable service. On the other hand, building a cross-region distributed cache is a lot of work and it is most likely cheaper to reuse Bigtable's replication and maybe write some code on top of that to compensate for the fact that it is not perfect.
This discussion keeps coming up because, seen from flight level 300, all problems look the same, whereas closer to the ground, all problems look different. You have to be very knowledgeable to figure out which details matter and which ones do not.
Look for instance at storage systems: At a high enough level they all do the same: Allow you to store some data and then query that data to find the stuff you stored previously. If you need to store a few gigabytes of information that you rarely query, it often really doesn't matter much which storage system you choose: They will probably all work well enough for this simple use case. But as you start stacking requirements onto this problem, the set of good choices starts becoming smaller quite quickly. If you want very low latency (let’s say single digit milliseconds) and the ability to ask very complex queries and need ACID transactions to boot, you pretty much end up with either MySQL or PostgreSQL with fast (probably SSD) disk drives. The fact that there are other data stores that also support SQL doesn't automatically make them good choices; you really need to review the entirety of the requirements and understand how different choices afford those requirements (or not).
Many of the discussions on the topic of code reuse are started by teams desperately looking for additional customers for their service. It seems that nobody wants to solve an actual problem anymore, instead people want to build generic engines for solving entire classes of problems (tip: Don’t build engines). This often means spending more time and money developing the thing and that is only worthwhile if you have other customers. Not seldom this leads to these teams aggressively pitching that some other team should not build “this or that”, but should reuse their solution over here. More than once, I got the distinct feeling that someone’s promotion would be impacted favorably by convincing another team to onboard to their “generic” solution…
Interestingly enough, whenever I am in discussions like this, it is rarely about software that is actually complicated or hard to build. I never hear that people want to build their own operating system, transactional distributed relational database, or high performance edge caching solution. Apparently, most people know that this is a lot of work and that, all things considered, it really does not pay off to do that in the context of the average business application. I am however constantly in discussions about teams wanting us to reuse their UI framework, form builder, workflow engine, rule engine, or domain specific configuration language (too many of which are just too dumb to even contemplate).
Most code reuse discussions are perfect examples of Parkinson's law of triviality, the prototypical example of which gave the world the phenomenal term "bike shedding". This prototypical example goes like this:
Imagine you are going to build a nuclear power plant and you get a committee together to discuss the blueprints. These blueprints contain detailed specs for the nuclear reactor, but also contain the design and specs for the employee bike shed.
Given the importance of the bike shed, this nuclear power plant is obviously located in the Netherlands. The example is topical because the newly formed Dutch government, which is carried by right wing idiots, is planning to build four new nuclear power plants for €5 billion each. Apparently they haven't been paying any attention to England's new nuclear power plant at Hinkley Point, which isn't in production yet, but which might end up costing a whopping $59 billion by the time it is ready to start charging electric cars. Anyway, I digress.
Everybody knows they are not experts on nuclear reactors and consequently nobody has much to say about nuclear reactor plans. But everybody thinks they are an expert on bike sheds and so the committee spends the majority of their time discussing that shed: Where it should be, how big it should be, what color it should have, which building materials to use, et cetera.
Talking about reusing some other team’s service is usually “bike shedding”. Most engineers know they are not knowledgeable enough to build the next Spanner or a successful new operating system. But everybody thinks they can say something smart about how to do if-then-else statements, how to generate a form, how to string two pieces of work together, or how to display lists of items on a web page. The cost of all this is staggering. I often spend way more time discussing how to build a particular thing than it would take me to build that thing and support it until the end of its life.
Here is a good yardstick to use: If you build something that is actually useful to other people, they will beat a path down your door to reuse it. The key criteria there is not whether you think that anyone should use it, what matters is if they think they should use it.
The other dark side of this is the teams that write something useful that everyone wants to re-use, but they are terrified of the consequences of having other users to support and being unable to do backwards-incompatible changes, so they lock down access and/or intentionally break its reusability.
The consequence of this is that everyone forks the whole code-base repeatedly, and you end up with multiple copies scattered all over the place, snapshotted at different points in history with different levels of buggyness and neglect.
The real test for how bad code is to see how hard is to delete it. Really bad code is typically full of convoluted reuse patterns...
Reuse shouldn't be the default. In one extreme, I caught some engineers reusing old code while rewriting a large part of an app. They should have just copy-pasted the "reusable" parts, because later, just the deletion of the old code took weeks.