Summary
Overview
Work History
Skills
Accomplishments
Timeline
Generic

Samir Cury

San Francisco

Summary

DevOps Engineer and Leader - Enabling Developer Teams and stakeholders to succeed via state of the art Infrastructure, Observability, Tooling and Processes.

Overview

18
18
years of professional experience

Work History

Director of DevOps

Plume Design, Inc
10.2022 - Current

Promoted from within to run the DevOps Organization, spending 30% of time on Internal Strategy, planning and stakeholder conversations, 40% mentoring Leadership and growing the team, and 30% in Customer Facing External Strategy.


Org Size: 16


  • Responsible for 3 DevOps/SRE Teams across 5 Timezones, all Operational Concerns, Cloud Infrastructure and implementation of Security, RBAC and VPN for 70+ people in Developer Teams.
  • Customer-facing exposure on QBRs, routine strategy meetings, presentations and high-level documents with Senior Executive Leadership.
  • Led and shaped the overall DevOps strategy and vision for the organization, aligning it with business objectives and driving technology transformation.
  • Implemented deep cultural change in the engineering teams, fostering innovation and modernization in lieu of "legacy works" stagnation.
  • Improved DevOps reputation from "blocker" to "enabler" after several programs to enable Service Teams to have autonomy on releasing and in Production, the applicable part of the Google SRE mindset.
  • Collaborated with cross-functional teams including development, operations, QA and IoT Device Operations to improve collaboration, streamline workflows, and eliminate bottlenecks.
  • Mentored or hired Technical Leadership and Management for the 4 Leadership Functions. 2 were promotions from within as a result of demonstrated growth.

Engineering Manager - DevOps

Plume Design, Inc
06.2020 - 10.2022

Led the DevOps Enablement Team from Turnaround state to Sustained Success.


Team Size: 4 Reports


40% Implementation+on-call, 60% coordination time.


  • Drove the unification of DevOps Platforms into Kubernetes, after thorough architecture improvements and benchmarks, then securing buy-in from Stakeholders/Dev Teams.
  • Pioneered Terraform and Kubernetes at Plume, currently running 31 Microservices in 8 Production Environments
  • Reduced burnout drastically in the team by simplifying deployment processes and implementing timezone-affinity for assignees.
  • Designed and mentored the team to implement a self-serve Framework for New Application Onboarding, removing DevOps as a dependency and scaling Application Onboarding for new products
  • Led and managed a team of DevOps engineers to successfully implement and maintain continuous integration and deployment processes, resulting in a 50% reduction in Release deployment time.
  • Designed and led implementation of a Terraform Framework that reduced Full Environment Provisioning from 3 to 1 month
  • Architected and led modernization of Jenkins from a monolithic, life-support, nearly dysfunctional state, to a distributed, Kubernetes Native model with constant upgrades and improvements.
  • Enforced DevOps best-practices in new applications and products as a pre-requisite for Production Launch

Staff DevOps Engineer

Walmart Labs
01.2019 - 06.2020

Founding member of the AdTech DevOps team, responsible for Infrastructure needs of several teams : Backend, 2 x Data Engineering, Frontend, Data Science.


On a Technical Leadership role, with potential to promotion to management.


  • Mapped scope and projects across 5 stakeholder teams and setup DevOps Roadmap
  • Hired 3 engineers in 6 months to compose the team
  • Introduced state-of-the-art Observability with Prometheus and Grafana
  • Led Incident Management best practices in collaboration with Project Managers and Customer Success
  • Designed and supported migration of 450+ Data Pipelines from Jenkins to Airflow as a scheduler
  • Designed and implemented the Department's migration from on-premise Datacenter to GCP
  • Wrote Airflow DAGs for PBs of Data Replication and Consistency checks across DCs
  • Main Infrastructure actor on GCP migration of a business-critical set of Data Pipelines which increased productivity by 10x, enabling more business and revenue to the Org.
  • Introduced Terraform for Cloud Resource Management and Security Policy enforcement, allowing Developer Teams to self-serve their Change requests through Code Pull Requests.

Sr. DevOps Engineer

Unity Tech
05.2017 - 12.2018

Focus on SRE, Infrastructure as Code and large scale distributed systems, light Data Engineering involvement.


Strong individual contributor on a distributed team amongst AMER and EMEA. Led a few engineers in discrete projects and efforts.


  • Cloud Providers - AWS, GCP
  • Developed Terraform Automation for provisioning of Kubeadm Kubernetes clusters
  • Benchmarked and launched Ingress Controller for 24k Req/s critical path application
  • Developed CI/CD Framework in Gitlab for new Containerized application onboarding
  • Trained Developers on CI/CD Framework, supported adoption
  • On-call for Production Systems
  • Involved in routine deployments or Production migrations
  • Common Data Pipeline - Deployed and maintained a 5-region system for ingestion of 92k Events/s via HTTP with Streaming through Kafka and output to ETL Data Lake. Focus on Architecture, Implementation through Cloudera, GCP/GKE + Terraform.
  • Co-Leadership on SRE Committee that pioneered a successful Incident Management process, triaging over 50 incidents.
  • Main actor and local DevOps lead for AWS->GCP Migration

Web Architecture Engineer

Autodesk
04.2015 - 06.2017

Solo sysadmin responsible for infrastructure, operations and uptime of www.instructables.com (1,500 req/s in 2015), in addition to participation on more general Microservices DevOps Team



  • Introduced Chef as Config. Management for the environment. Replica of production in Dev Workstations.
  • Upgraded/Maintained Jenkins for CI/CD w/ Integration tests for 3 codebases and Dev Teams
  • Migrated Monitoring system from Cacti to Zabbix, enabling templated Dashboards
  • Microservices - Backend in Java Hibernate ; Frontend in Python Django, Javascript Assets
  • Infrastructure - Solr as Search Index, self-managed MySQL Cluster, CouchDB Server, HAProxies+Pacemaker, Varnish+VCL for CDN
  • Fixed 30% 10s timeouts in Java REST API with Garbage Collection Fine Tuning and Migration from Serial to CMS Collector.
  • Designed browser-based telemetry feature for time-on-page tracking, developed backend on NoSQL+Map Reduce and partnered with Frontend Developer for launch.
  • Basic AWS Resource Management for minor web applications

HPC Site Administrator

Caltech
03.2013 - 04.2015

Systems administrator of a Bare Metal HPC Facility for particle physics.


Responsible for Compute and Storage resource management, transfer systems, research initiatives in conferences in addition to routine duties.


  • Computing - HTCondor Batch System - 5.5k Cores, 14 Racks, 350 Servers
  • Storage Systems - Hadoop HDFS - 4.4 PB Raw disk
  • Foreman/Puppet for IaaC, OS Provisioning
  • HTCondor as a Batch System
  • Developed automated CPU Benchmarking system to make data-driven decisions on Hardware Purchases
  • Facility went from 7th to 3rd place in the US Ranking of data processing during my tenure.
  • 2x more core count purchased per year with similar budget of similar facilities
  • Led Network Transfers Benchmarking group amongst 7 sites, after breaking the record of Transatlantic distributed transfer speeds.
  • Designed+Implemented updates on Datacenter Power and Cooling systems.

CMS Tier-0 Operator and Developer

CERN
01.2011 - 02.2013
  • Phase I - Responsible for Data Reprocessing workflows in 3 regions, 12k cores.
  • Operated and Developed multi data-center Workflow Management System (DMWM)
  • Phase II - Operated and Developed Workflow Management System for high criticality, 12h SLA Processing system for Accelerator Data.
  • Phase III - Technical Leadership for a team of 3 people, interfacing with stakeholders and managers of 12 Physics Groups
  • On-call 24/7 AAA for Tier-0 system
  • Wrote on-call instructions and commissioned with a pool of 30 on-calls.
  • Improved Observability of the system with OSS and integrated with CERN home-grown tools
  • Development Tech Stack : Python(+OO/TDD), Oracle DB, Javascript/JQuery, CouchDB
  • Published Research in the Journal of Physics regarding compute performance + developed a new workflow modulation algorithm.


Unix Systems Administrator

High Energy Physics Grid - Brazil 2 Datacenter
11.2007 - 12.2010


  • CERN/LHC (Large Hadron Collider) Collaboration
  • Daily Operations and expansion projects of the HPC Facility
  • Represented the facility in collaboration status updates
  • Responsible for dCache & HDFS Distributed Filesystems, HTCondor Batch System and Grid interfaces for Compute and Transfers
  • Yearly maintenance of host certificates for 30+ servers
  • Datacenter Hardware maintenance
  • Networking equipment setup (L2/L3)
  • Deployed Xen VM Hypervisor for upgrade tests and prototyping
  • Training of student cohorts/new hires
  • LDAP System for Local Users and their workstations
  • Centralized /home/user directories in NFS
  • Rocks Linux / BSD shop


Electrical Engineering Student

State University of Rio De Janeiro
01.2007 - 12.2010

Skills

  • Terraform

  • Kubernetes

  • AWS

  • GCP

  • Scalable Architecture Design

  • Observability

  • Docker

  • Configuration Management

  • Test Driven Infrastructure

  • Perl

  • Python

  • Scala

  • Production Troubleshooting

  • Networking

  • Strategy

  • Project Management

Accomplishments

Analytical and data-driven problem solver, leveraging independent decision making habits to drive positive change towards organizational success.


Objective: utilize my skillset and experience to improve Technology state and Architecture in a new organization with different challenges. Planning to be hands-on and part of the change.



Accomplishments:


  • Published DOI 10.1088/1742-6596/513/3/032023 - Journal of Physics 2014 - HPC Research
  • Citation by DOI:10.1088/1742-6596/664/6/062050 Journal of Physics 2015 - ML Approach, similar goals
  • Led US CMS High Throughput Transfers Group (7 sites)

Timeline

Director of DevOps

Plume Design, Inc
10.2022 - Current

Engineering Manager - DevOps

Plume Design, Inc
06.2020 - 10.2022

Staff DevOps Engineer

Walmart Labs
01.2019 - 06.2020

Sr. DevOps Engineer

Unity Tech
05.2017 - 12.2018

Web Architecture Engineer

Autodesk
04.2015 - 06.2017

HPC Site Administrator

Caltech
03.2013 - 04.2015

CMS Tier-0 Operator and Developer

CERN
01.2011 - 02.2013

Unix Systems Administrator

High Energy Physics Grid - Brazil 2 Datacenter
11.2007 - 12.2010

Electrical Engineering Student

State University of Rio De Janeiro
01.2007 - 12.2010
Samir Cury