resume

David Winiarski

Staff Site Reliability Engineer · Calgary, AB

summary of qualifications

education

BSc, Economics · University of Victoria

relevant experience

Staff Site Reliability Engineer

Zapier

March 2025 - Present

  • Unlocked enterprise SLA-backed contracts by leading Zapier's company-wide SLA program, defining calculation methodology with Product, Legal, and Engineering, and rebuilding the underlying data pipeline to cut per-SLA implementation overhead
  • Designed and shipped a Go-based Envoy Gateway authz service that extracts JWT claims and injects them as upstream headers at sub-1ms p95 across 60k RPS, giving incident responders account-level visibility during outages
  • Delivered the Envoy gateway migration whose primitives now power Zapier's public API platform, letting any internal team securely expose APIs externally without re-implementing auth, rate-limiting, or observability
  • Drove RFC and rollout of release guardrails on Argo Rollouts (canaries gated by deterministic metric checks plus LLM agents that evaluate logs and other noisy signals), making AI-generated code changes safe to ship at platform speed
  • Standardized observability on OpenTelemetry across the org, giving every new service traces, metrics, and logs by default with zero per-team configuration
  • Consolidated the telemetry collection stack onto Grafana Alloy, retiring multiple per-host agents and shrinking on-call surface area

Senior Site Reliability Engineer

Zapier

Aug 2023 - March 2025

  • Cut EKS cluster upgrade cycles from multi-week, multi-engineer efforts to one engineer in one week by leading a Kubernetes standardization initiative that consolidated clusters and adopted managed EKS capabilities
  • Drove Karpenter to 100% of production capacity, replacing ASG-based node groups, with new services now defaulting to Graviton and compute scaling linearly with task execution volume
  • Migrated legacy EC2-based services to Kubernetes, bringing them under the standard platform's deployment, observability, and autoscaling story
  • Designed a centralized authentication solution for internal tooling, eliminating recurring access-grant support tickets

Site Reliability Engineer

Zapier

Sept 2021 - Aug 2023

  • Cut centralized logging spend from $1.2M/year to ~$840k/year by leading the migration from Elastic Cloud to AWS OpenSearch, establishing the precedent that made AWS OpenSearch Zapier's standard for search/analytics
  • Built failover paths for critical customer traffic

Site Reliability Engineer

Replicon

Nov 2017 - Sept 2021

  • Cut Postgres query times from 30s to under 1s for the largest enterprise tenants by eliminating full table scans
  • Owned the Aurora RDS Postgres migration from needs assessment and cost forecasting through cutover, improving resilience and operating cost
  • Drove FedRAMP-aligned security remediation across the org as a key contributor to the audit and remediation effort
  • Operated a multi-region Terraform-controlled AWS estate (EC2 Linux/Windows, ECS, Elastic Beanstalk, DynamoDB) supporting global customers

Data Engineer

Go2mobi

May 2015 - Nov 2017

  • Built and operated multi-TB/day data pipelines and a dynamic Docker-based Spark execution platform
  • Designed and deployed Kubernetes clusters across bare metal, AWS, and GCE, and migrated legacy services onto the new platform