staff site reliability engineer · calgary, ab

Building the quiet parts of reliable systems.

I'm David. I've spent 10+ years on infrastructure, platform engineering, and reliability — most recently at Zapier, where I lead cross-team work on observability, authz at the edge, and SRE practice. This site is where I write things down.

read the blog see the work resume github linkedin

what I work on

Reliability

SLOs, failure modes, and the calculus of what to page on.

Observability

OpenTelemetry, correlated telemetry, and signals that survive contact with production.

Platform

Kubernetes, Envoy, Terraform — internal tools that reduce cognitive load.

Cost + performance

Karpenter, Graviton, query-level optimization. Unit economics matter.

a bit more about me

I'm drawn to the boundary between software engineering and operations — the code that keeps other code running. My favorite work is the kind that disappears: an authz service that adds sub-millisecond p95 but unlocks account-level incident visibility, a cluster upgrade process that used to take weeks and now takes a week.

Away from keyboards, I'm usually somewhere in the mountains west of Calgary.

Building the quiet parts of reliable systems.

what I work on

Reliability

Observability

Platform

Cost + performance

a bit more about me

recent writing

The Backup Was From November

iSCSI on Talos: Why the Obvious Path Doesn't Work

The Five-Second Ghosts