Staff Site Reliability Engineer · Calgary, AB

Building the quiet parts
of reliable systems.

I'm David. I've spent 10+ years on infrastructure, platform engineering, and reliability — most recently at Zapier, where I lead cross-team work on observability, authz at the edge, and SRE practice. This site is where I write things down.

What I work on

Reliability

SLOs, failure modes, and the calculus of what to page on.

Observability

OpenTelemetry, correlated telemetry, and signals that survive contact with production.

Platform

Kubernetes, Envoy, Terraform — internal tools that reduce cognitive load.

Cost + performance

Karpenter, Graviton, query-level optimization. Unit economics matter.

A bit more about me

I'm drawn to the boundary between software engineering and operations — the code that keeps other code running. My favorite work is the kind that disappears: an authz service that adds sub-millisecond p95 but unlocks account-level incident visibility, a cluster upgrade process that used to take weeks and now takes a week.

Away from keyboards, I'm usually somewhere in the mountains west of Calgary.

Recent writing