# David Winiarski

> Staff Site Reliability Engineer based in Calgary, AB. 10+ years of experience in infrastructure, platform engineering, and reliability, with focus on observability, authz at the edge, and Kubernetes platform work.

Personal site: https://winiar.ski

## About

- [About page](https://winiar.ski/): landing page with bio, focus areas, and recent writing
- [LinkedIn](https://www.linkedin.com/in/davewiniarski/)
- [GitHub](https://github.com/davidwin93)

## Showcase

Selected work and initiatives. See all: https://winiar.ski/showcase

- **Envoy Gateway authz service** (Platform): A sub-millisecond JWT extraction layer that turned anonymous request flows into account-attributed ones. Changed what we could see during incidents.
- **OpenTelemetry standardization** (Observability): Cross-team initiative to make observability a default, not a per-service lift. Pulled a sprawl of tools into a single pipeline on Alloy.
- **EKS upgrade path** (Kubernetes): Collapsed cluster count, leaned into managed features, and took upgrade effort from multi-week-with-a-team to one-week-with-one-person.
- **Karpenter + Graviton rollout** (Cost): Replaced managed ASGs with Karpenter, unlocking Graviton without making teams think about node pools. Meaningful cost-per-task reductions.
- **Company-wide SLA program** (SLO): Built the underlying data pipeline and calculation methodology for customer-facing SLAs — a prerequisite for enterprise contracts.
- **Postgres query optimization at scale** (Data): Took 30-second full-table scans on large enterprise tenants and brought them under a second without denormalizing the schema.

## Blog

Index: https://winiar.ski/blog · Feed: https://winiar.ski/rss.xml

- [Why I'm Starting This Blog](https://winiar.ski/blog/welcome/) — 2026-04-23: A short note on what to expect here — SRE, infrastructure, and platform engineering, with an emphasis on practical tradeoffs.

## Resume

Full resume: https://winiar.ski/resume

### Profile

- Staff SRE with 10+ years of experience in infrastructure, platform engineering, and reliability
- Technical leader who drives cross-team initiatives in observability, security, and infrastructure standardization
- Proven track record of designing high-performance systems and reducing operational complexity
- Strong background in mentoring engineers and establishing SRE practices across organizations
- Extensive experience with Go, Python, Terraform, AWS, Kubernetes, OpenTelemetry, and Envoy
- Combines software development expertise with deep infrastructure and systems knowledge

### Education

- BSc, Economics — University of Victoria

### Experience

#### Staff Site Reliability Engineer — Zapier (March 2025 – Present)

- Designed and implemented high-performance authz service for Envoy Gateway that extracts JWT values and injects them as headers upstream, enabling account-level visibility during incidents with p95 latency under 1ms
- Led cross-team initiative to standardize observability on OpenTelemetry, providing default observability for all services
- Consolidated telemetry tooling onto Alloy, reducing operational complexity and tool sprawl across the organization
- Led company-wide SLA initiative, reducing implementation overhead for underlying data and collaborating with stakeholders to define calculation methodology, directly enabling higher-tier enterprise contracts
- Mentored engineers, led SRE practices across teams, and contributed to hiring processes

#### Senior Site Reliability Engineer — Zapier (Aug 2023 – March 2025)

- Led Kubernetes standardization initiative to leverage EKS capabilities and reduce cluster count, cutting upgrade time from multiple weeks with multiple engineers to one week with a single engineer
- Drove Karpenter adoption to replace standard ASGs, significantly reducing cost per task execution and enabling teams to leverage Graviton instances
- Designed centralized authentication solution for internal applications, effectively eliminating internal support requests for access grants
- Migrated legacy systems from EC2 to Kubernetes

#### Site Reliability Engineer — Zapier (Sept 2021 – Aug 2023)

- Reduced centralized logging costs by 40% by leading migration from elastic.co to OpenSearch
- Designed and implemented failover mechanisms for critical traffic

#### Site Reliability Engineer — Replicon (Nov 2017 – Sept 2021)

- Managed multi-region EC2 clusters with Terraform to support customer needs worldwide
- Deployed organization-wide metrics system
- Improved CI and CD system to reduce overall deployment time
- Assisted in an organization-wide security audit and remediation effort to satisfy FedRAMP compliance
- Designed and implemented load testing system
- Designed and developed an automated recovery system
- Implemented a standardized edge proxy for use by both legacy and new systems
- Managed diverse AWS infrastructure including Linux and Windows EC2, ECS, Elastic Beanstalk, and DynamoDB
- Led migration to Aurora RDS Postgres to improve resilience, including needs assessment and cost forecasting against existing RDS usage
- Optimized Postgres queries for large enterprise customers by eliminating full table scans, reducing query times from 30 seconds to under 1 second

#### Data Engineer — Go2mobi (May 2015 – Nov 2017)

- Developed data pipelines that process multiple terabytes per day
- Developed dynamic Docker-based systems to execute Spark jobs
- Planned and deployed multiple Kubernetes clusters across bare metal, AWS, and GCE
- Migrated legacy projects into Kubernetes