# David Winiarski > Staff Site Reliability Engineer based in Calgary, AB. 10+ years of experience in infrastructure, platform engineering, and reliability, with focus on observability, authz at the edge, and Kubernetes platform work. Personal site: https://winiar.ski ## About - [About page](https://winiar.ski/): landing page with bio, focus areas, and recent writing - [LinkedIn](https://www.linkedin.com/in/davewiniarski/) - [GitHub](https://github.com/davidwin93) ## Showcase Selected work and initiatives. See all: https://winiar.ski/showcase - **Envoy Gateway authz service** (Platform): A sub-millisecond JWT extraction layer that turned anonymous request flows into account-attributed ones. Changed what we could see during incidents. - **OpenTelemetry standardization** (Observability): Cross-team initiative to make observability a default, not a per-service lift. Pulled a sprawl of tools into a single pipeline on Alloy. - **EKS upgrade path** (Kubernetes): Collapsed cluster count, leaned into managed features, and took upgrade effort from multi-week-with-a-team to one-week-with-one-person. - **Karpenter + Graviton rollout** (Cost): Replaced managed ASGs with Karpenter, unlocking Graviton without making teams think about node pools. Meaningful cost-per-task reductions. - **Company-wide SLA program** (SLO): Built the underlying data pipeline and calculation methodology for customer-facing SLAs — a prerequisite for enterprise contracts. - **Postgres query optimization at scale** (Data): Took 30-second full-table scans on large enterprise tenants and brought them under a second without denormalizing the schema. ## Blog Index: https://winiar.ski/blog · Feed: https://winiar.ski/rss.xml - [Why I'm Starting This Blog](https://winiar.ski/blog/welcome/) — 2026-04-23: A short note on what to expect here — SRE, infrastructure, and platform engineering, with an emphasis on practical tradeoffs. ## Resume Full resume: https://winiar.ski/resume ### Profile - Staff SRE with 10+ years of experience in infrastructure, platform engineering, and reliability - Technical leader who drives cross-team initiatives in observability, security, and infrastructure standardization - Proven track record of designing high-performance systems and reducing operational complexity - Strong background in mentoring engineers and establishing SRE practices across organizations - Extensive experience with Go, Python, Terraform, AWS, Kubernetes, OpenTelemetry, and Envoy - Combines software development expertise with deep infrastructure and systems knowledge ### Education - BSc, Economics — University of Victoria ### Experience #### Staff Site Reliability Engineer — Zapier (March 2025 – Present) - Designed and implemented high-performance authz service for Envoy Gateway that extracts JWT values and injects them as headers upstream, enabling account-level visibility during incidents with p95 latency under 1ms - Led cross-team initiative to standardize observability on OpenTelemetry, providing default observability for all services - Consolidated telemetry tooling onto Alloy, reducing operational complexity and tool sprawl across the organization - Led company-wide SLA initiative, reducing implementation overhead for underlying data and collaborating with stakeholders to define calculation methodology, directly enabling higher-tier enterprise contracts - Mentored engineers, led SRE practices across teams, and contributed to hiring processes #### Senior Site Reliability Engineer — Zapier (Aug 2023 – March 2025) - Led Kubernetes standardization initiative to leverage EKS capabilities and reduce cluster count, cutting upgrade time from multiple weeks with multiple engineers to one week with a single engineer - Drove Karpenter adoption to replace standard ASGs, significantly reducing cost per task execution and enabling teams to leverage Graviton instances - Designed centralized authentication solution for internal applications, effectively eliminating internal support requests for access grants - Migrated legacy systems from EC2 to Kubernetes #### Site Reliability Engineer — Zapier (Sept 2021 – Aug 2023) - Reduced centralized logging costs by 40% by leading migration from elastic.co to OpenSearch - Designed and implemented failover mechanisms for critical traffic #### Site Reliability Engineer — Replicon (Nov 2017 – Sept 2021) - Managed multi-region EC2 clusters with Terraform to support customer needs worldwide - Deployed organization-wide metrics system - Improved CI and CD system to reduce overall deployment time - Assisted in an organization-wide security audit and remediation effort to satisfy FedRAMP compliance - Designed and implemented load testing system - Designed and developed an automated recovery system - Implemented a standardized edge proxy for use by both legacy and new systems - Managed diverse AWS infrastructure including Linux and Windows EC2, ECS, Elastic Beanstalk, and DynamoDB - Led migration to Aurora RDS Postgres to improve resilience, including needs assessment and cost forecasting against existing RDS usage - Optimized Postgres queries for large enterprise customers by eliminating full table scans, reducing query times from 30 seconds to under 1 second #### Data Engineer — Go2mobi (May 2015 – Nov 2017) - Developed data pipelines that process multiple terabytes per day - Developed dynamic Docker-based systems to execute Spark jobs - Planned and deployed multiple Kubernetes clusters across bare metal, AWS, and GCE - Migrated legacy projects into Kubernetes