Gall's Law

My first professional engineering job was at a company rebuilding their cash-cow Java monolith into a “scalable and modern” microservice stack. The system design had ~200 services handling all the complex flows the monolith supported while also planning for future scale and flexibility. Supposedly.

When the first actual user was added to the system 2 years later, the experience was so slow and painful we lost the account completely. The rewrite never finished, and as far as I know they’re still on the monolith today (though I’m sure they’ve evolved it).

A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.

There’s no good substitute for getting humans to use a product. A simple system that runs for real users (or holds real traffic) forces unknowns into view while the surface area is still small enough to fix.

When teams overengineer in the name of “planning for the future”, the product is going to fall over. It takes discipline to keep saying “YAGNI” especially when the future feels obvious. But in my experience it’s less obvious than you think, and the cost of adding features to a simple system later is less painful than dealing with an unnecessarily-complex system.

Overengineering is also tightly coupled with tech debt, although some debate if it’s even a real thing.

Links here