A Life Engineered

A Life Engineered

Stop Solving Problems. Start Anticipating Them.

A practical guide to looking around corners, at work, in your career, and with your life.

Steve Huynh's avatar
Steve Huynh
Jul 30, 2025
∙ Paid
91
9
4
Share
grayscale photo of three people on terrace

Years ago, my team was responsible for a critical service at Amazon that handled millions of requests per minute. The service was getting intermittent errors from a downstream API it called. The fix seemed obvious. I added a simple retry mechanism in the code. If a request failed, the service would immediately try again, up to three times in rapid succession.

And it worked. The intermittent errors on our dashboards vanished. Problem solved. Everyone was happy.

Until a few months later, when the downstream API had an extended outage. When their service finally started to recover, it was immediately overwhelmed and fell over again. The culprit was us. Our service, along with others, unleashed a "retry storm"—a massive, concentrated flood of requests that hammered their recovering systems back into oblivion. My "quick fix" for the small problem had created a catastrophic failure condition for a much bigger one.

We had solved the first-order problem, but we had failed to see the second-order…

Keep reading with a 7-day free trial

Subscribe to A Life Engineered to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Steve Huynh
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture