resume
David Winiarski
Staff Site Reliability Engineer · Calgary, AB
summary of qualifications
- Staff SRE with 10+ years building reliability, observability, and platform foundations for high-traffic SaaS
- Drives multi-team initiatives that cut operational toil, unlock revenue, and standardize how engineers ship and run services
- Strong software background in Go and Python, plus deep experience with AWS, Kubernetes, Terraform, OpenTelemetry, and Envoy
education
BSc, Economics · University of Victoria
relevant experience
- Unlocked enterprise SLA-backed contracts by leading Zapier's company-wide SLA program, defining calculation methodology with Product, Legal, and Engineering, and rebuilding the underlying data pipeline to cut per-SLA implementation overhead
- Designed and shipped a Go-based Envoy Gateway authz service that extracts JWT claims and injects them as upstream headers at sub-1ms p95 across 60k RPS, giving incident responders account-level visibility during outages
- Delivered the Envoy gateway migration whose primitives now power Zapier's public API platform, letting any internal team securely expose APIs externally without re-implementing auth, rate-limiting, or observability
- Drove RFC and rollout of release guardrails on Argo Rollouts (canaries gated by deterministic metric checks plus LLM agents that evaluate logs and other noisy signals), making AI-generated code changes safe to ship at platform speed
- Standardized observability on OpenTelemetry across the org, giving every new service traces, metrics, and logs by default with zero per-team configuration
- Consolidated the telemetry collection stack onto Grafana Alloy, retiring multiple per-host agents and shrinking on-call surface area
- Cut EKS cluster upgrade cycles from multi-week, multi-engineer efforts to one engineer in one week by leading a Kubernetes standardization initiative that consolidated clusters and adopted managed EKS capabilities
- Drove Karpenter to 100% of production capacity, replacing ASG-based node groups, with new services now defaulting to Graviton and compute scaling linearly with task execution volume
- Migrated legacy EC2-based services to Kubernetes, bringing them under the standard platform's deployment, observability, and autoscaling story
- Designed a centralized authentication solution for internal tooling, eliminating recurring access-grant support tickets
- Cut centralized logging spend from $1.2M/year to ~$840k/year by leading the migration from Elastic Cloud to AWS OpenSearch, establishing the precedent that made AWS OpenSearch Zapier's standard for search/analytics
- Built failover paths for critical customer traffic
- Cut Postgres query times from 30s to under 1s for the largest enterprise tenants by eliminating full table scans
- Owned the Aurora RDS Postgres migration from needs assessment and cost forecasting through cutover, improving resilience and operating cost
- Drove FedRAMP-aligned security remediation across the org as a key contributor to the audit and remediation effort
- Operated a multi-region Terraform-controlled AWS estate (EC2 Linux/Windows, ECS, Elastic Beanstalk, DynamoDB) supporting global customers
- Built and operated multi-TB/day data pipelines and a dynamic Docker-based Spark execution platform
- Designed and deployed Kubernetes clusters across bare metal, AWS, and GCE, and migrated legacy services onto the new platform