Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Rejul James

Raleigh-Durham-Chapel Hill Area

Summary

Dynamic Senior Site Reliability Engineer with a proven track record at Cybersecurity firms, excelling in cloud infrastructure management and proactive alerting. Adept at automating processes and enhancing system reliability, I leverage critical thinking and advanced monitoring solutions to drive operational excellence and ensure seamless service delivery.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Senior Site Reliability Engineer

SentinelOne
Raleigh-Durham-Chapel Hill Area
10.2022 - Current
  • Implemented Site Reliability Engineering (SRE) best practices to enhance system reliability, availability, and scalability.
  • Managed service uptime, incident response, and disaster recovery strategies to ensure seamless operations.
  • Designed and deployed robust monitoring solutions using Prometheus, Grafana, and New Relic for real-time observability.
  • Developed and optimized proactive alerting systems to enable rapid issue detection and resolution.
  • Automated routine tasks and system maintenance using Python, reducing manual effort and improving efficiency.
  • Built and maintained CI/CD pipelines to streamline deployments, improve consistency, and accelerate software delivery.

Senior Site Reliability Engineer

Palo Alto Networks
Santa Clara County
10.2021 - 10.2022
  • Champion a philosophy of automation, continuously eliminating repetitive operational tasks by leveraging advanced automation tools and scripting.
  • Extensive experience in managing global, cloud-based applications at scale on AWS and Google Cloud, ensuring optimal performance, scalability, and reliability.
  • Deep understanding of networking protocols, including IP routing, TLS, and troubleshooting complex network issues, utilizing logs and metrics to diagnose and resolve high-level transactional problems.
  • Designed and implemented log aggregation (Loki), tracing (Jaeger), and metrics aggregation (Prometheus, Cortex) to create comprehensive observability and monitoring solutions for complex microservice architectures.
  • Built and maintained proactive alerting systems to ensure rapid detection and resolution of potential issues in microservice environments.
  • Deployed and managed microservices in Kubernetes (K8s) using CI/CD pipelines, ensuring smooth, automated, and efficient application deployment and updates.
  • Participated in on-call rotations, providing expert operational support for services owned by DevOps and SRE teams, ensuring high availability and quick issue resolution.
  • Collaborated with internal stakeholders to gather feedback and requirements, driving the adoption of DevOps and SRE solutions that align with business needs and improve overall system performance.

Site Reliability Engineer

Palo Alto Networks
Santa Clara
03.2019 - 10.2021
  • Collaborated with SaaS vendors to drive the delivery and enhancement of an enterprise-class cloud security service, influencing all aspects of the service lifecycle.
  • Worked with cutting-edge cloud software and web applications, deploying and scaling next-generation cloud security solutions leveraging big data technologies.
  • Contributed to the entire software development life cycle, including designing, coding, testing, and releasing features, ensuring timely and high-quality deliverables.
  • Collaborated closely with cross-functional teams (engineering, quality assurance, and developers) to ensure seamless and accurate product delivery, maintaining high standards of performance and security.
  • Thrived in a fast-paced, high-energy environment, delivering feature-rich applications while contributing to and receiving constructive code reviews to ensure continuous improvement and quality.
  • Proficient in technologies such as Kubernetes, Python, Ansible, and Terraform, automating infrastructure, and enhancing scalability and security of cloud-based solutions.

Devops Engineer

Tata Consultancy Services
kochi
09.2012 - 12.2014
  • Served as a Hadoop Administrator within the TCS Digital Enterprise - Analytics and Big Data team, overseeing the installation, configuration, and management of various Hadoop distributions (Cloudera, Apache, and Hortonworks) to meet product team requirements.
  • Installed and configured Cloudera on 6 nodes and Hortonworks on another 6 nodes, ensuring seamless integration with critical Hadoop ecosystem services such as Zookeeper, Hive, HBase, Oozie, and Hue, with experience scaling clusters as needed.
  • Strong working knowledge of the ITIL framework, leveraging best practices for IT service management and ensuring service reliability and efficiency.
  • Utilized ticketing tools (CCM) to manage incidents, performing Root Cause Analysis (RCA) to resolve service interruptions quickly and effectively.
  • Experienced in Linux-based OS installation and configuration, including Red Hat Linux, CentOS, Fedora, Ubuntu, and Solaris, with daily management tasks like network connectivity checks, disk space monitoring, CPU/memory usage assessment, and application status verification.
  • Configured critical network services including NFS, NIS, DHCP, DNS, SAMBA, FTP, HTTP, TCP/IP, SSH, and Firewall for secure and efficient system operation.
  • Set up and managed Kickstart servers using TFTP for automated OS installations, and applied updates and patches through Red Hat Satellite Server to maintain system security and performance.
  • Installed and configured Nagios for proactive network bandwidth monitoring and disk health checks, ensuring the timely detection of issues.
  • Performed routine LVM tasks, including drive replacements, volume group expansions, LVM/file system extensions, and volume group migrations for hardware upgrades and reliability.
  • Managed and maintained database systems such as MySQL, MongoDB, and PostgreSQL on RedHat/Debian servers, ensuring optimal performance and data integrity.
  • Led the design and implementation of fully automated server build management, monitoring, and deployment, utilizing DevOps tools like SaltStack and Ansible for streamlined, consistent configurations.
  • Built and maintained private cloud infrastructure using OpenNebula, RHEVM (certified), and OpenStack, delivering flexible, scalable solutions to meet business needs.
  • Performed guest OS maintenance within private cloud environments, ensuring stability and security of virtualized systems.
  • Developed backup scripts and implemented monitoring processes for both hypervisor and guest OS, ensuring data integrity and system uptime.Backup scripts and monitoring of hypervisor and guest OS

Education

Master of Science (MSc) - Computer Science

Illinois Institute of Technology
12.2016

Bachelor of Technology (B.Tech) - Computer Science

Model Engineering College
01.2012

10 & 12th - Computer Science

Rajagiri HSS
01.2008

Skills

  • Linux
  • Shell Scripting
  • System Administration
  • Site reliability engineering
  • Incident management
  • Monitoring solutions
  • Proactive alerting
  • CI/CD pipelines
  • Cloud infrastructure management
  • Microservices deployment
  • Network troubleshooting
  • Critical thinking

Certification

  • Red Hat Certified Engineer
  • Red Hat Certified Security Specialist
  • Red Hat Certified Administrator
  • Red Hat certified Virtualization Expert

Timeline

Senior Site Reliability Engineer

SentinelOne
10.2022 - Current

Senior Site Reliability Engineer

Palo Alto Networks
10.2021 - 10.2022

Site Reliability Engineer

Palo Alto Networks
03.2019 - 10.2021

Devops Engineer

Tata Consultancy Services
09.2012 - 12.2014

Master of Science (MSc) - Computer Science

Illinois Institute of Technology

Bachelor of Technology (B.Tech) - Computer Science

Model Engineering College

10 & 12th - Computer Science

Rajagiri HSS
Rejul James