Why I'm Starting This Blog

April 23, 2026 meta · sre

I spend most of my days thinking about reliability, observability, and the unglamorous plumbing that keeps distributed systems running. The best lessons I’ve learned almost never came from a blog post — they came from an outage, a post-mortem, or a conversation with someone who’d already made the mistake I was about to make.

This space is an attempt to pay some of that back. I want to write up a few of the patterns, war stories, and opinions that have stuck with me, in the hope that someone who’s about to make a similar decision has one more reference point.

What to expect

A few threads I’m interested in writing about:

Envoy, JWTs, and authz at the edge — what actually worked, what burned an afternoon, and when a 1ms p95 is a lie
OpenTelemetry in practice — the gap between the spec and what your vendor actually ingests
Kubernetes upgrade strategy — how to go from “a multi-week ordeal with three engineers” to “one person, one week”
Karpenter, spot, and Graviton — cost wins that don’t come with an operational tax

I’m going to favor concrete over comprehensive. If a post is useful to exactly one person who has the same problem I had, that’s a win.

What not to expect

No hot takes on whatever service broke this week. No “10 things every SRE should know.” I’d rather publish less and mean it.

More soon.