12 Comments

> Question from a European, WTF is actually "balancing a checkbook"? I've heard the phrase coined hundreds of time, but I got rid of checks in the early 1990s and hadn't seen one until I moved to the US, and even here I write at most one a quarter or so because, in essence, the US banking system is stuck in the eighties, which was a great decade to remember for its music and hairdos, but not for its consumer banking.

I hate "balancing the checkbook". I had to write a pile of checks for a while to contractors. I realized that if I open a 'second checking account' and only write checks out of that, I just move money into that account when I write a check and when the balance on that account is zero, all the outstanding checks have been paid. Works pretty good, but doesn't solve the "why is there still $X sitting in this account a month later". But seems to always get to zero because folks eventually cash it and then I remember.

Expand full comment

> Well, I have seen this tried a dozen times before and I have never seen it work, but maybe this time we'll get it right

The height of disagree and commit! 😂

Expand full comment

:-)

Expand full comment

> Must be this tall to write multi-threaded code.

https://bholley.net/blog/2015/must-be-this-tall-to-write-multi-threaded-code.html

Expand full comment

Yep. Even the Java developers got it wrong, as was hilariously described in a classic post called: Wot, no chickens?

https://www.cs.kent.ac.uk/projects/ofa/java-threads/0.html

Expand full comment

IMHO the whole threading model with shared memory was flawed from the beginning; shared memory is a shared resource with no locking, which is a race condition waiting to happen. You have to add your own locking around the pieces of that shared memory that are actually shared, which is hard, and usually results in a deadlock waiting to happen.

The really nasty thing about the threading model is it **seems* to make parallel programming easy, but it has nasty traps that mean it's actually hard, and most people don't realize their code is buggy. Lots of the race-condition problems are less likely to trigger for multi-tasking (time-slicing on a single core, which is what threading was originally invented for), but become more likely for multi-processing (multi-core). This meant lots of people didn't realize their threading code was bad until they started trying to run it on multiple cores.

For multi-tasking I always preferred asynchronous programming, which **seems** hard, because it makes it obvious that it really is hard and forces you to do it right, and for multi-processing I preferred processes and communication channels, which clearly partitions the memory so it's not a shared resource, and also maps to the hardware better.

I find it ironic that modern languages and approaches to try and fix the problems with threading are re-inventing asynchronous programming and/or isolated processes with communication channels inside threads.

Expand full comment

Also, threading's shared memory doesn't map well to the underlying hardware when running on multiple cores. Sharing memory and cache coherency across processors is expensive, and a huge amount of effort and silicone have been thrown at the hardware to try and make shared memory more efficient. However, in practice if you care about threading performance you still have to spend a lot of time tweaking your threading code to avoid actually sharing memory and thrashing cache lines.

IMHO all the effort expended on making shared memory more efficient in hardware just helped the bad threading model persist. It made shared memory fast enough, and people forgot about the huge cost in silcone. The world would be in a better place now if the hardware guys had just put their foot down and told the software guys the problem was not their hardware, it's your crappy software.

Expand full comment

For a while I thought that NUMA architectures might have plotted a way out of this "multiple cores / big pool of shared memory" but without radically different programming paradigms that architecture too often devolved into a inefficient large pool of shared memory.

Expand full comment

You are absolutely on the money.

Expand full comment

Great article! Thank you, Devil! I mean, “Jos”.

Joking aside, this is really interesting. I agree that tools may have changed, but working in groups of people has not (though, we are better at remote, thanks to the tools). I also agree about math and concurrency.

One thing I wonder about is the level abstraction. The book “Range” talks about the fact that IQ scores keep going up, despite normalization and cites increased ability to reason abstractly. This is a very slow effect, but maybe this is making things better over time too. So, maybe new people could acquire experience from older devils faster as time progresses? Maybe older devils should help facilitate that?

Expand full comment

I think IQ scores keep going up because we are much better at doing IQ tests and that making new IQ tests.

Overall I see no reason to assume that people are getting smarter. That is why classic texts on politics and leadership still make an incredible amount of sense. Think: Marcus Aurelius, Sun Tzu, Von Clausewitz, etc.

What we _do_ do much better these days is educating more people to a higher level. Advantages in the fields like maths, physics, engineering, etc did not happen because we got smarter people now, we just got more of the brains that exist on the planet working on the problems.

Expand full comment

NUMA was the original preferred multi-processor architecture because it matched the hardware well, but it fell by the wayside for a couple of reasons;

1. It didn't fit the popular threading model.

2. It made buying memory expensive, because you needed to buy memory-modules per-core.

3. It didn't fit the dominant threading model.

The "inefficient large pool of shared memory" effect was precisely because the dominant paradigm for parallel-processing was threading. Well, that's not 100% true; there is also a class of problems usually assigned to super-computers that make it difficult to partition the data (usually huge matrices) per-core. But those problems are pretty rare and specialized; map-reduce shows that the vast majority of huge problems can indeed be solved by partitioning the data.

On the small scale, the way CPU's have evolved with huge multi-layer caches mean we effectively are using NUMA; per-core-cache is the new RAM, and RAM (and shared cache) is the new disk, and thrashing cache-lines is the new accessing-remote-ram.

And of course on a larger scale, your whole datacenter is a NUMA architecture.

Expand full comment