Summary

Overview

Work History

Education

Skills

Websites

Certification

Timeline

Saadia Khanam

Rockville,MD

Summary

Seasoned Principal SRE/DevOps with 15 years of experience, adept in AWS cloud services, Terraform, and Kubernetes. Expert in designing CI/CD pipelines, automation, and system reliability; seeking to further contribute as a SRE/DevOps. Proficient in developing monitoring and observability solutions and leading DevOps initiatives to optimize performance and cost.

Overview

years of professional experience

Certification

Work History

Principal SRE/DevOps

Federated Wireless

Arlington, VA

04.2022 - Current

Improved code deployment efficiency by 40% by automating processes with CI/CD pipelines.
Automated manual tasks through scripting languages such as Python and Shell, boosting team productivity by 35%.
Led implementation of automated workflows, leveraging comprehensive array of AWS services (including EC2, ASG, Load Balancers, Security Groups, S3, SQS, RDS, Lambda, VPC, EKS, Route53, KMS, IAM, Cloud Watch, etc.) and Terraform, resulting in 50% reduction in manual intervention.
Implemented CI/CD pipelines involving Circle CI, AWS Cloud Formation, Terraform, and Ansible to automate deployment and management of applications and infrastructure changes, reducing deployment time by 45%.
Migrated on-premise monolith services to AWS Cloud (ECS and EKS), emphasizing serverless and containerized solutions to enhance security, optimize costs, and ensure scalability, leading to 30% reduction in operational costs.
Designed and managed Kubernetes clusters, ensuring optimal performance and scalability, while also optimizing configurations to maximize compute resource utilization by 25%.
Collaborated closely with architecture leads and software engineering teams to ensure seamless operation on Kubernetes clusters and scalable infrastructure provisioning on AWS, enhancing system uptime to 99.999%.
Designed and developed advanced monitoring, logging, and analytics solutions, including Open Telemetry, Promtail, Prometheus, Thanos, and Loki for rapid issue detection and resolution within Kubernetes environments, reducing mean time to resolution (MTTR) by 30%.
Established and measured Service Level Objectives (SLOs) against Service Level Indicators (SLIs) to ensure high availability targets of 99.999% uptime (5 9s).
Mentored teams of engineers, fostering culture of collaboration, innovation, and growth in development and implementation of DevOps principles, which increased team productivity by 20%.
Demonstrated excellent problem-solving and analytical skills, leadership, and real-time decision-making ability to thrive in fast-paced, collaborative environment, resolving 95% of incidents within SLA.
Drafted Root Cause Analyses (RCAs) to be shared across customers, contributing to transparency and continuous improvement efforts, leading to 20% reduction in recurring issues.
Took active participation in change management, incident management, and production deployments, ensuring 99% success rate in changes and deployments.
Gathered feedback from key stakeholders and provided recommendations on best way to advance my team's objectives, achieving 25% increase in stakeholder satisfaction.

Senior DevOps Engineer

Federated Wireless

Arlington, VA

06.2019 - 03.2022

Reduced deployment times by 50% through introduction of CI/CD pipelines for multiple projects, significantly accelerating time-to-market.
Implemented advanced CI/CD strategies and automation, enhancing deployment frequency by 40% and infrastructure as code practices.
Developed custom scripts in Python and Bash, automating repetitive tasks and integrating disparate systems, which streamlined workflows across departments by 30%.
Migrated on-premises legacy applications to AWS, ensuring enhanced security and cost-effectiveness, leading to 25% reduction in operational costs.
Led management of AWS and Kubernetes environments, focusing on performance optimization, system reliability, and cost optimization, resulting in 35% improvement in system reliability.
Collaborated with development teams to optimize application performance and resource usage, improving application response times by 20%.
Implemented CI/CD pipeline using CircleCI, Ansible, and Cloud Formation, which reduced build times by 45%.
Established proactive monitoring systems by implementing tools such as Prometheus, Grafana, and PagerDuty, leading to 30% decrease in incident response times.
Implemented ELK stack logging solution to detect and fix issues in production, reducing issue resolution time by 25%.
Wrote custom scripts to automate various DevOps tasks using Python, which increased operational efficiency by 40%.
Actively participated in debugging production issues and led them to resolution, maintaining 95% uptime for production systems.
Provided on-call support for production systems, ensuring high availability and reliability.
Integrated SonarQube into CI pipeline to automate static code security scanning, improving code quality.
Mentored junior DevOps engineers, fostering culture of learning and continuous improvement.
Orchestrated development and release processes for new features, achieving faster release cycle.
Streamlined build processes, reducing development time and accelerating project timelines.

Senior DevOps Engineer

Magellan Health

Richmond, VA

12.2017 - 05.2019

Reduced deployment times by 50% with introduction of CI/CD pipelines for multiple projects.
Managed AWS and Kubernetes environments, optimizing performance and reliability of systems, resulting in 35% improvement in system uptime.
Developed custom scripts using Python and Bash to automate repetitive tasks and integrate disparate systems, effectively streamlining workflows across departments by 30%.
Established proactive monitoring system by implementing tools such as Prometheus, Grafana, and PagerDuty for monitoring and alerting purposes, leading to 25% decrease in incident response times.
Implemented logging solution using ELK to detect and fix issues in production, reducing issue resolution time by 20%.
Wrote custom scripts to automate various DevOps tasks using Python
Conducted regular system audits to ensure adherence to security best practices
Troubleshot and resolved system issues, minimizing downtime and maintaining high system availability
Provided on-call support for production systems
Mentored team members
Orchestrated development and release process for new features
Streamlined build process to reduce development time.

Production Engineer

Tata Consultancy Services

Irving, TX

02.2015 - 12.2017

Developed and implemented production processes for new product line, resulting in 20% increase in production output.
Provided technical support to troubleshoot equipment issues, minimizing downtime and maintaining production targets, achieving 95% service uptime.
Improved production efficiency by 30% by developing and implementing new manufacturing processes.
Set up monitoring for production environment
Set up Splunk for log analysis
Resolved production issues that had arisen due to changes in raw materials
Successfully introduced new production methods that increased efficiency and decreased costs
Trained production staff on new processes and procedures
Monitor production to ensure adherence to quality standards
Optimized production processes to increase efficiency
Resolved production issues in timely manner
Conducted root cause analysis of production problems.

System Administrator

Tata Consultancy Services

Irving, TX

08.2008 - 01.2015

Managed deployments of Citibank's credit card applications in on-premise data centers, ensuring 99.9% uptime and seamless customer experience.
Provisioned new software and hardware for use, following established security policies, resulting in zero security breaches.
Troubleshot and resolved system issues, minimizing downtime and maintaining high system availability, achieving 30% reduction in mean time to resolution (MTTR).
Simplified troubleshooting processes by creating detailed documentation for system configurations, procedures, and best practices.
Installed IBM Websphere Application Server on new development server.
Monitored server performance and took preventative measures to maintain performance.
Implemented security measures to protect servers from malicious attacks.
Responded to server issues in timely and efficient manner.
Created and maintained server documentation.
Investigated and resolved network connectivity issues.
Performed regular backups of critical data.
Responded to system administrator requests for assistance.
Provided support for end user.

Education

Masters of Statistics - Statistics

University of Calcutta

Kolkata, India

01.2008

Bachelor of Statistics - Statistics

University of Calcutta

Kolkata, India

01.2006

Skills

Cloud: Amazon Web Services (AWS)
Programming: Python, Bash
DevOps: Kubernetes, Helm Chart, Argo CD, Terraform, Cloud Formation, Ansible
CI/CD: Jenkins, GitHub Actions, CircleCI
Monitoring: Open Telemetry, Tempo, Prometheus, Grafana, ELK, Splunk
Security: AWS IAM, VPC, Encryption, Hashicorp Vault

SRE: Service Level Objectives (SLOs), Service Level Indicators (SLIs), Service Level Agreements (SLAs)
Networking: TCP/IP, UDP, HTTP, DNS, TLS
Containers: Docker, Podman
Organizational Development
Multitasking ability
Strategic leadership

Websites

https://www.linkedin.com/in/saadia-khanam-096462138/

Certification

AWS Solution Architect - Associate

Timeline

Principal SRE/DevOps

Federated Wireless

04.2022 - Current

Senior DevOps Engineer

Federated Wireless

06.2019 - 03.2022

Senior DevOps Engineer

Magellan Health

12.2017 - 05.2019

Production Engineer

Tata Consultancy Services

02.2015 - 12.2017

System Administrator

Tata Consultancy Services

08.2008 - 01.2015

Masters of Statistics - Statistics

University of Calcutta

Bachelor of Statistics - Statistics

University of Calcutta