Sr Cloud DevOps engineer with over 11+ years’ experience in Cloud (Azure, AWS), DevOps, Configuration management, Infrastructure automation, Continuous Integration and Delivery (CI/CD). Experience in dealing with Unix/Linux and Windows server administration.
Overview
11
11
years of professional experience
Work History
Sr Cloud Engineer
Penske Truck Leasing
Philadelphia, Pennsylvania
07.2023 - Current
Worked on Microsoft Azure Cloud (Public) to give IaaS support to customers
Make Virtual Machines through Powershell Script and Azure Portal
Oversee and Create a Storage Account and Affinity Group in Azure Portal
Designed and configured Azure Virtual Networks (VNets), subnets, Azure network settings, DHCP address blocks, DNS settings, and Security policies & configured BGP routes to enable ExpressRoute and site-to-site VPN connections between on-premises data centers & Azure cloud
Led implementation of Azure Active Directory for single sign-on and Authentication for Web Applications
Also configured Azure Role-based Access Control (RBAC) to segregate duties within our team and grant only the amount of access to users that they need to perform their jobs based on the Roles defined
Created and deployed VMs on the Microsoft cloud service Azure, created and managed the virtual networks to connect all the servers, and designed ARM templates/Terraform for the Azure platform
Configured three types of blobs, block blobs, page blobs, and append blobs in Azure for storing a large amount of unstructured object data such as text or binary data, that can be accessed via HTTP or HTTPS and enabling data redundancy and Lifecycle Rules and Events
Worked on Managing the Private Cloud Environment using Ansible and Enhanced the automation to assist, repeat, and consist of configuration management using Ansible-based YAML scripts
Set up custom Domains and configure network security group (NSG) rules to specify ingress/egress traffic restrictions
Worked on Terraform to create the various services like AKS, ACR, VNET, VM ..etc as infrastructure as code in various environments as per the project need
Created inventory in Ansible for automating CD & developed Ansible playbooks and Roles using YAML scripting
Used ELK stacking to monitor the logs for detailed analysis, worked on dashboarding using Elastic, Logstash & Kibana (ELK), & set up real-time logging & analytics for CD pipelines & applications
Worked on the creation of Docker images on top of microservices and deployed them on Azure Kubernetes services
Worked on Kubernetes cluster creation and creation of Deployments, services, RBAC, and Ingress
Worked on using a GIT branching strategy that included developing branches, feature branches, staging branches, and master
Pull requests and code reviews were performed
Azure Automation through Runbooks Creation, Migration of existing.PS1 scripts, Authorizing, Configuring, Scheduling
Configured ServiceNow to receive instant notifications of any configuration changes in the cloud environment by orchestrating through Logic Apps
Migration of on-premises data (Oracle/ SQL Server/ MongoDB) to Azure SQL/Cosmos DB using Azure Data Factory
Experience in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage, Azure AD Licenses) using Terraform and managed Azure Infrastructure through Blueprints and Landing Zone
Experienced in utilizing Azure Stack (Compute, Web & Mobile, Blobs, ADF, Resource Groups, Azure Data Lake, Azure Data Factory, Azure SQL, App Services, and Cosmos DB) and services for configuring and deploying Azure Automation Scripts for multiple applications
Deployed multiple microservices into Azure Kubernetes by Dockerizing them and using Jenkins and Azure DevOps
Migrated the Build forge projects to Azure DevOps with all the work items, and source codes, and built and released pipelines by using the custom PowerShell tool
Developed build workflows using Gradle, Gitlab-CI, Docker, and OpenShift
Integrated Docker container orchestration framework using Kubernetes by creating pods, and deployments
Extensively used Azure PaaS solutions and hosted Isolated App service environment integrated with PaaS Azure SQL and Virtual network to host different types of applications like Web apps, API App, Function Apps, etc
Create CI/CD pipelines for the deployment of services and tools to the Kubernetes cluster hosted on Bare Metal
Deployment of CNF on Kubernetes clusters using Helm charts and the TCA tool
Monitors the Kubernetes Cluster jobs and performance
Working on upgrading Kubernetes cluster, commissioning & decommissioning of Nodes, Pods
Configured Azure Infrastructure Automation using Terraform scripts and launched various services like AKS, ACR, VM, and VN Etc
Designed Network Security Groups (NSGs) & Load Balancers to control inbound and outbound access to network interfaces (NICs), VMs, and subnets
Experience in creating the infrastructure by using the Terraform for various environments
Implemented HA and reliable deployment models with Azure Classic and Azure Resource Manager
Configured Azure Active Directory and IAM to manage users and groups privileges also user account management (SSO/SAML) and multifactor authentication
Involved in firewall deployment and management in Azure such as Palo Alto, and Azure Firewall
Implemented automated build & operational tasks using Python and Powershell scripts
Implemented Serverless Cloud Services using Azure Functions with application insights
Implemented a CI/CD pipeline using Azure DevOps (VSTS/TFS) in both cloud and on-premises with GIT, MS Build, Docker, and Maven along with Jenkins pipeline builds & YAML/JSON
Developed and maintained Continuous Integration (CI) using tools in Azure DevOps (VSTS) spanning multiple environments, enabling teams to safely deploy code in Azure Kubernetes Services (AKS) using YAML scripts and HELM charts
Worked on creating, configuring, and managing AKS clusters in Azure, including managing node pools, configuring networking, and setting up load balancing
Managing the Azure Kubernetes Services (AKS) policies, providing access to different Azure resources, and developing and improving the workflows that govern access
Monitoring and troubleshooting Kubernetes clusters using Prometheus and Grafana
Worked with Grafana to monitor and visualize system metrics, application performance, and other data sources
Experienced with developing Ansible Playbooks, Modules and Kubernetes Nodes, Pods, Config Maps, Selectors, Services etc.
Migrated the current Linux environment to AWS/RHEL Linux environment used auto scaling feature and was involved in Remediation and patching of Unix/Linux Servers
Configured and managed Cloud Infrastructure in AWS including EC2, Route53, S3, RDS, Lambda, EFS, S3 Glacier, Resource Access Manager, IAM, CloudFront, Cloud Watch, Elastic Load Balancer, Security Group, focusing on Auto scaling and high - -availability
Used GIT as SCM in branching, tagging, and maintaining the versions across the environments and used for recovering files, saving changes for later (Stash), creating tags, viewing logs, etc
Created and configured GitHub Actions workflows to automate application builds for Java, .NET, and Progress applications, enabling seamless continuous integration across multiple technology stacks
Built scripts using Maven in Jenkins and SonarQube for continuous delivery through Deployment from one environment to another environment
Deployed applications into PROD & Pre-Prod environments with various Web-Application servers like Jboss & Apache Tomcat
Developed and maintained continuous integration and deployment systems using ANT, Maven, Gradle, JUnit, Selenium, SonarQube, Jfrog, and Nexus
REST API and serverless development using Node.js on AWS Lambda, SQS, SNS, SES, and API Gateway
Leveraged Boto3, the Python SDK, to build scalable and secure applications on AWS
Used Boto3 for programmatic access to AWS services like EC2, S3, and DynamoDB from Python code
Container builds/deployments using Multi-Stage Docker files, Docker Compose, and Docker Swarm Stacks
Implemented a production-ready, load-balanced, highly available, and fault-tolerant Kubernetes infrastructure
Worked on Kubernetes to manage containerized applications using its nodes, ConfigMaps, and selector services and deployed application containers as Pods
Managed Clusters using Kubernetes and worked on creating many pods, replication controllers, services, deployments, labels, and health checks
Design, build, secure, and manage clusters/workloads running on self-managed Kubernetes (Kops), Amazon EKS (Amazon Kubernetes Service), Amazon ECS, and AWS Far gate
Implemented deployments into AWS EC2 instances using Terraform and managed, and maintained added plugins to support new functionalities of Terraform
Worked on Terraform for managing the infrastructure through the terminal sessions and executing the scripts for creating alarms and notifications for EC2 instances using Cloud Watch
Migrated Containers running at On-Prem OpenShift cluster to EKS Cluster by writing HELM Charts and integrating with Jenkins
Proficient in utilizing the Terratest framework to automate testing and validation for Terraform infrastructure deployments, ensuring correctness and stability, thereby enhancing reliability and efficiency
Experienced in effectively managing Terraform state files to maintain infrastructure-as-code consistency, ensuring proper tracking, version control, and collaboration across teams, contributing to streamlined and efficient deployment workflows
Used Ansible and Ansible Tower as Configuration management tools, to automate repetitive tasks, quickly deploy critical applications, and proactively manage change
Configured and managed S3 versioning and lifecycle policies to backup files and used Glaciers to Archive data
Deployed OpenShift Infrastructure Through Terraform and configured the Master Nodes, Infrastructure Nodes, and application nodes
Implemented centralized container logging and monitoring using CloudWatch, Prometheus, and Grafana
Design, build, and manage the ELK (Elasticsearch, Logstash, and Kibana) cluster for centralized logging and search functionalities for the App
For better caching of images, implemented Amazon CDN (Content Delivery Network) using CloudFront to deliver data with less latency and high performance
Setting up Jenkins Credentials with AWS Secrets Manager for storing Application configuration data and all App Settings changes
Written Lambda functions in Python for AWS Lambda, Kinesis, and ElasticSearch, which invokes Python and Bash Shell scripts to perform various transformations and analytics on large data sets in EMS clusters
Solution and implement object storage using AWS S3, CloudFront, Akamai, and Cloudflare
Used Jira for defect/issues logging & tracking and documenting all work using Confluence..
Azure DevOps Engineer
HD Supply
Atlanta, USA
06.2021 - 12.2022
Designing, planning and implementation for existing On-Premises applications to Cloud
Responsible to be primary SME on Azure services including SaaS, PaaS and IaaS while contributing architecture decisions and tasks for ongoing migration efforts
Automated rollback procedures in Urban Code Deploy pipelines, minimizing downtime and reducing the impact of deployment failure
Remote login to Virtual Machines to troubleshoot, monitor and deploy applications
Provisioned Azure resources like SQL Database, Web App, Storage Account, Redis Cache, Virtual Machine, IoT Hub, and HDInsight using Azure Resource Manager (ARM)
Experience in creating and designing the Terraform templates to create custom sized Resource groups, Kubernetes cluster, Container, blob storages, IOT hub, Event hub
Infrastructure as a code deployment of Web application templates
Implemented self-service environments in Urban Code Deploy, empowering development teams to provision environments on-demand
Implemented centralized logging solutions (e.g., ELK stack) to collect and analyze logs from distributed robotics systems, aiding in troubleshooting and performance optimization
Maintained Gitlab runner instances in AWS which help in running the Jenkins CI pipelines for developers and assisting them in deploying
Their applications into the Kubernetes cluster
Utilized and supported the use of GitLab Runners and Jenkins for Continuous Integration (CI) to implement CI
Configure and administrate spark GitHub standalone cluster
Automated infrastructure provisioning and configuration management using tools such as Ansible and PowerShell DSC, ensuring consistency and reliability across C# application deployments
Mention any specific projects you've worked on within GCP, emphasizing your contributions and achievements
Conducted performance tuning and optimization of Oracle Database instances on Oracle Cloud Infrastructure, enhancing application responsiveness and scalability
Managed Azure Infrastructure Azure Web Roles, Worker Roles, VM Role, Azure SQL, Azure Storage, Azure AD Licenses, Virtual Machine Backup and Recover from a Recovery Services Vault using Azure PowerShell and Azure Portal
Written Templates for Azure Infrastructure as code using Terraform to build staging and production environments
Implemented the integration between App services with Application Insights for monitoring the Activity logs of web-apps and automation for deployments by using YAML scripts for massive builds and release Management
Ensured Azure services beyond basic IaaS functionality and Used Azure Resource Manager (ARM) to deploy, update, or delete all the resources for your solution in a single, coordinated operation
Developed custom monitoring solutions using Prometheus and Grafana to monitor C# application performance, resource usage, and health metrics
Orchestrated CI/CD pipelines leveraging Autosys for scheduling and executing automated builds, tests, and deployments
Orchestrated the provisioning of infrastructure and environments using Urban Code Deploy environment management capabilities
Monitor spark jobs on UI AKS and Tune spark environment and code to improve performance of Spark job
Configured Azure Backup Service for taking backup of Azure VM and data of on premise to Azure and Leveraged Azure Automation and PowerShell, Ansible to automate processes in the Azure Cloud
Created Clusters using Kubernetes and worked on creating many pods, replication controllers, services, deployments, labels, health checks and ingress by writing YAML files
Integrated existing APIs to Azure API management to get all the attributes like security, usage plans, throttling, analytics, monitoring, and alerts
Conducted regular security assessments and audits of robotics systems, identifying and addressing potential vulnerabilities to safeguard sensitive data and ensure operational integrity
Used Terraform to reliably version and create infrastructure on Azure
Created resources, using Azure Terraform modules, and automated infrastructure management
Delegating the Production Deployment process to IPS Team and monitoring the production rollouts and validation process
Consulted and recommended clients in Build and Release Management Implementation
Implemented SRE practices in altering critical application issues and altering site latency issues
As an SRE, managed disaster recovery and tested the disaster recovery plans before roll outs
Utilized SCM/Build tools to aid developers in resolving issues such as merge conflicts, compilation errors, and missing dependencies
Participated in the migration and automation processes of DevOps for building and deploying systems
Deployed Dynatrace and CloudWatch to monitor the AWS resources comprehensively, including EC2 instances, S3 buckets, RDS databases, and network configurations
Developed custom monitoring dashboards in Dynatrace and CloudWatch to gain real-time insights into system performance, resource utilization, and application health
Configure automated alerting rules within Dynatrace and CloudWatch based on predefined thresholds and anomaly detection algorithms
Establish escalation policies and on-call rotations to ensure timely response to critical incidents
Integrate with communication platforms like Slack or AWS SNS for efficient alert notifications and team collaboration
Utilize Splunk for centralized log management and analysis to understand system behavior, application errors, and security events
Develop custom Splunk dashboards and reports for visualizing log data and identifying trends
Integrate Cribl to enrich, filter, and route logs effectively before indexing in Splunk, optimizing data volume and search performance
Develop Python, Bash scripts and automation workflows to streamline routine operational tasks such as instance provisioning, configuration management, and deployment
Utilize GitLab for version control and CI/CD pipelines to automate infrastructure changes and application updates seamlessly
Design and implement a robust disaster recovery strategy to ensure business continuity in case of infrastructure failures or data loss
Utilize AWS Backup for data replication and AWS Disaster Recovery services for failover orchestration between primary and secondary environments
Conduct regular DR drills and performance testing to validate the recovery plan and identify areas for improvement
Automated tasks using Ansible, Python, Perl, or shell scripting with a focus on precision, standardization, and adherence to processes and policies
Proficient in crafting Terraform and Cloud Formation Templates (CFT) in YAML and JSON formats to construct AWS resources, emphasizing Infrastructure as Code principles and implementing cost-saving measures
Proficient in managing Docker images, including retrieving existing images from Docker Hub, building custom images using Docker files, and pushing images to remote repositories
Collaborated within an agile development team to deliver a comprehensive continuous integration/continuous delivery (CI/CD) solution in an open-source environment, leveraging tools like Puppet and Jenkins
Developed Puppet manifests and modules to automate deployment processes and integrated them into Jenkins jobs for continuous delivery (CD)
Integrate source control systems using Git and build automation tools Jenkins, GitLab CI/CD to enable continuous integration and delivery of software
Deployed EC2 instances on AWS, including various flavors such as Oracle Linux, RHEL, CentOS, Ubuntu, and Solaris on both Linux and Windows platforms
Used Google's SRE (site reliability engineer) culture to maintain dependable infrastructure while adhering to essential elements such as SLIs, SLOs, and SLAs
Conduct post-mortems with teams following each rollback or deployment failure
Established and automated the CI/CD process, managing the Build and Deployment Platform and coordinating code promotions and deployments using Jenkins
Experienced in working with .Net applications, performing branching, tagging, and release activities on Version Control Tools like GIT and Subversion (SVN)
Responsible for conducting security scans, tracking defects, reporting defects, and reproducing defects using SonarQube, Bugzilla, and JMeter
Builds new environments in AWS, including setup of VPC, subnets, security-groups, etc., including use of CloudFormation templates, Python/BOTO3 scripts, and Salt Stack states
Created S3 buckets and managing policies for S3 buckets and using them for storage, backup and archived in AWS and worked on AWS LAMBDA which runs the code with a response of events and Implemented API Gateways, Authentication
Developed comprehensive architecture strategies for environment mapping in AWS that involved Active Directory, LDAP, AWS Identity and Access Management (IAM) Role for AWSAPI Gateway platform
Collaborated with cross-functional teams to define infrastructure requirements and create reusable Puppet code
Installed, configured, and administered the Jenkins tool on Linux machines and set up a Master-slave architecture to improve performance
Used Jenkins for continuous integration and continuous deployment into the Tomcat Application Server
Created branching & tagging strategy to maintain the source code in the GIT repository and coordinated with developers with establishing and applying appropriate branching, labeling/naming conventions using GIT source control
Integrated Jenkins with version control systems like Git for automatic builds and testing on code commits
Utilized Jenkins plugins to automate various stages of the software development lifecycle
Utilized Ant for build automation and dependency management, streamlining the build process for Java projects
Implemented Nagios for infrastructure and application monitoring, ensuring real-time alerts for performance issues
Implemented the Chef software setup and configuration on VMs, deployed run-lists into the Chef-server, and bootstrapped Chef clients remotely
Tested Chef cookbook modifications on cloud instances in AWS using Test Kitchen and Chef Spec
Managed multiple cookbooks in Chef and used environments, roles, and templates for better environment management
Collaborated with operations teams to proactively identify and resolve incidents
Developed Python scripts for automation tasks, such as log analysis and data processing
Created Shell scripts for automation and configuration management tasks in Linux environments
Developed shell scripts for log rotation, backups, and system monitoring
Utilized Perl for data processing and automation tasks
Supported .NET applications in development and production environments.
Linux Administrator
NTT Data
India
09.2013 - 05.2016
Installed and managed Red Hat Linux, Solaris, and Windows Server systems
Provided Level 2 support, managing tickets, responding to alerts, and troubleshooting issues
Monitored system performance metrics like CPU, memory, and disk space
Configured core services such as DNS, NFS, Samba, LDAP, TCP/IP, FTP, and HTTP
Utilized LVM for volume management and RAID and implemented system administration scripts
Managed user accounts, groups, and permissions, and implemented LDAP and Active Directory authentication
Managed and administered multiple Linux distributions, including Ubuntu, CentOS, and Red Hat Enterprise Linux
Installed and configured Apache Tomcat and WebSphere application servers, including load-balanced clusters
Collaborated with DBAs for Oracle database installation, restoration, and log management
Automated tasks using shell scripts, cron jobs, and system backups with Veritas NetBackup
Assisted with Solaris Jumpstart and RHEL Kickstart OS deployments
Troubleshoot backup, restore, and end-user issues on Solaris and Linux servers
Developed and maintained scripts for various services using UNIX shell and Perl
Worked with DBAs on Oracle and RDBMS installations, security patching of Linux servers, and Splunk administration.