Summary
Overview
Work History
Education
Skills
Timeline
Generic

Naveen Garg

Cupertino,CA

Summary

Experienced Site Engineer with 15 years in Threat Intelligence, Data Analysis, system design and solving complex computing performance issue. My goal is to enhance product reliability, scalability, and performance through innovative solutions, leveraging diverse engineering principles. I am driven by the opportunity to make a difference in the world of cybersecurity and data research and improve overall system productivity.

Overview

17
17
years of professional experience

Work History

Senior Site Reliability Engineer

Akamai Technology
06.2019 - Current
  • Achieved high stability and reliability for the security products such as bot detection and fraud prevention by identifying and mitigating potential issues.
  • Successfully implemented a fail-over design, significantly minimizing service disruptions and enhancing the overall reliability of the system.
  • Developed automation tools and scripts for streamlined production cluster launch, verification, implemented self-healing systems to enhancing operational efficiency.
  • Implemented robust monitoring and alerting systems to detect and respond to security threats, system failures, or performance issues promptly.
  • Performed capacity planning to ensure that the infrastructure can handle the load, especially during times of increased security threats or traffic spikes.
  • Design and implement resilient architectures for security products, considering factors such as fault tolerance and disaster recovery to minimize the impact of potential failures.
  • Maintain documentation for operational procedures, configurations, and troubleshooting guides.

Consultant

HCL America
01.2015 - 06.2019

Job Responsibilities :

  • Site performance and quality control by enhancing monitoring checkpoints ,trends and data analysis.
  • Develop Self-serve tools for operation automation to reduce hands-off and process improvement. Create auto bots /digital assistant based on NLP and machine learning to correlates application issues and provide first hand support to operation team.
  • Oversee code release pipeline and ensure high code delivery without impacting site availability during development life cycle.
  • Tier-2 Support for applications issues and provide support to cross functional team.
  • Provide RCA and solutions for complex Re- occurring system and performance issues to improve overall environment health.

Achievements :

  • Developed Task delivery and goal progress tool to visualize overall progress and automate report and handover's delivery.
  • Developed Environment capacity planning tool to provide bird-view for better project budget planning and finding gaps in present infrastructure.
  • Continues development of digital assistant and auto bot to automate day to day operations and troubleshooting tasks.
  • Automated patch implementation after doing automation of restart procedure of application server without outages.
  • Prepared Alert tracking tools to preserve work logs, data analysis tooling to observe trends for site reliability enhancement.
  • Initiated self-healing system for alert handling and escalation.
  • Participating CI/CD pipeline developments and resolving blockers down the road.
  • Participating in Test automation by using Selenium and Cucumber and other tools.
  • Prepared and track team members skill development plan.
  • High quality of monitoring system enforcement by using tools Nagios , Netcool , Solarwinds , Web monitoring ,Dashboard to reduce issue counts and stable environment availability.

Senior System Specialist

HCL America
12.2011 - 01.2015

Job Responsibilities :

  • Perform Code and config release in multiple testing environment with minimum outages .
  • Ensure high site availability by using site monitoring tools.
  • Provide first hand support for applications and system related issues .
  • Continuous improvement of operation delivery by automating day to day operations.

Achievements :

  • Streamline Code release operations with consistent site availability 99% including roll and maintenance.
  • Transform monitoring support to achieve 99.99% clean monitoring system.
  • Automated Several operations, maintenance and report task to save resource time and keep operations effective. Quick summary as below:
  • Overall operation progress report including ticket resolution, Site availability report to track the progress.
  • Maintenance work e.g. Market Open setting, Quotes issues recovery, SSH key maintenance etc.
  • Developed Learning portal to organize new team members training path and keep track of the progress and efforts.

System Specialist

HCL Technologies
06.2007 - 12.2011

Job Responsibilities :

  • Prepare performance testing scenarios by using LoadRunner after product analysis and discussion with development team.
  • Prepare Load testing environment by roll out code and config on performance servers.
  • Support PRD performance issues by running various load testing scenario and provide results to development team.
  • Support DR and QA environment.

Achievements :

  • Prepared PRD like Performance load testing environment by achieve 100% alert free system to perform PRD performance certification and PRD issues PELT.
  • Enhanced PRD roll certification process and reduced certification time from 2 days to 5 hours.
  • PRD roll certificate delivery improved from last day to 1 week in advance by collaboration with SCM and operation team.
  • Automated PELT scenarios to resolve critical PRD issue when PELT count increase from 5 to 35 per day.
  • Expedite Data center migration of QA environment by automated configuration conversion.
  • Coordinated and drive DR verification activity to save around 30% of tech community time.
  • Developed and deployed PLT result site to centralized PLT results for easy access and save records for future.
  • Streamline team members project training process by centralize knowledge-base and developed training tracker portal.

Education

Bachelor of Engineering - Computer Science

Krishna Institute of Engineering And Technology
India
06.2002

Skills

Cybersecurity: Expertise in implementing and maintaining security controls, firewalls, and intrusion detection/prevention systems

AI and Machine Learning: Experience with implementing and managing machine learning models for security analytics and anomaly detection

Scripting and Automation: Proficiency in scripting languages such as Python, Shell for automation tasks

Cloud Computing: Knowledge of cloud platforms (AWS and Linode) and implementing security controls in cloud environments

Monitoring and Logging: Proficiency in implementing and managing monitoring and logging solutions (eg, Prometheus, ELK stack) Experience with security information and event management (SIEM) systems for real-time threat detection

Network Security: Understanding of network security principles, including firewalls, VPNs, and network segmentation Experience with securing network communications and implementing secure network architectures

Reliability Engineering: Experience with incident response and post-incident analysis to improve system reliability Proficiency in capacity planning, load balancing, and resource optimization

Timeline

Senior Site Reliability Engineer

Akamai Technology
06.2019 - Current

Consultant

HCL America
01.2015 - 06.2019

Senior System Specialist

HCL America
12.2011 - 01.2015

System Specialist

HCL Technologies
06.2007 - 12.2011

Bachelor of Engineering - Computer Science

Krishna Institute of Engineering And Technology
Naveen Garg