
Dynamic Cloud Automation and Network Engineer with 5 years of specialized experience in designing and managing cloud-native infrastructures, AI-driven monitoring solutions, and network operations across hybrid and multi-cloud environments. Expertise in leveraging advanced tools such as Kubernetes, OpenShift, Docker, Ansible, Terraform, Jenkins, and ArgoCD to drive CI/CD automation, streamline container orchestration, and implement infrastructure as code. Proven track record of optimizing system performance and enhancing operational efficiency through innovative cloud strategies. Committed to continuous improvement and staying ahead of industry trends to deliver cutting-edge solutions.
● Provided 24x7 NOC support for critical infrastructure services spanning on-premises data centers and cloud-hosted environments (AWS, Azure, VMware), ensuring uninterrupted service delivery for enterprise clients.
● Performed continuous infrastructure health checks, monitoring the performance and availability of servers, storage systems, network switches, firewalls, and containerized applications on RedHat OpenShift.
● Utilized Prometheus, Grafana, OneConsole, and CloudWatch to track system KPIs such as CPU utilization, memory usage, disk I/O, network throughput, and latency, ensuring that thresholds were met at all times.
● Acknowledged and validated alerts generated by the monitoring systems, filtering out false positives, prioritizing critical incidents, and assigning them to the appropriate Tier 2/Tier 3 support teams.
● Conducted initial triage on infrastructure faults, including log analysis, port checks, pod/VM restarts, configuration validation, and verifying connectivity between core components.
● Documented each incident thoroughly in ServiceNow, including impact statements, RCA summaries, escalation notes, and resolution timelines for compliance and future reference.
● Utilized Prometheus, Grafana, and OneConsole for real-time service health monitoring, KPI tracking, and capacity utilization analysis.
● Executed operational tasks including pod restarts, basic configuration checks, and L1/L2 runbook executions for fault remediation.
● Ensured stability and performance of Virtual Network Functions such as EPC, IMS, PCRF, and VoLTE core elements.
● Worked closely with RAN, transport, and core network teams to validate end-to-end service readiness for 5G trials and production rollout.
● Configured Multus CNI in OpenShift for multiple network interfaces per CNF to support control plane and user plane separation in 5G deployments.
● Tuned worker nodes for NUMA-aware CPU pinning, hugepages allocation, and interface bonding to meet 5G Core throughput requirements.
● Oversaw the health and performance of core network functions and containerized applications running on RedHat and VMware hybrid cloud platforms.
● Diagnosed and resolved IP-based and wireless core network issues across MSC, HLR/HSS, SGSN/MME, and GGSN/PGW elements.
● Supported deployment, integration, and acceptance testing of 4G and early 5G trial sites, including eNodeB and gNodeB activations.
● Conducted Location Routing Number (LRN) migrations, BSC additions/removals, and implemented changes through controlled CR processes.
● Configured APNs, bearer profiles, CAMEL settings, SGSN/GGSN, and LTE packet core parameters for data services.
● Performed roaming configuration audits and end-to-end voice/SMS/data testing for domestic and international roaming partners.
● Served as the primary contact for high-severity incidents, performing root cause isolation through log analysis, symptom matching, and KPI correlation.
● Documented incidents in ServiceNow, ensured proper impact classification, and coordinated escalations to Tier 3/vendor teams (Ericsson, Nokia, Cisco, RedHat).
● Worked closely with the NOC to reduce Mean Time to Repair by streamlining triage and escalation procedures.
● Provided L2 support for VIP customers, resolving persistent roaming, call quality, and data service issues.
● Integrated 5G Core CNFs with Prometheus, and Instana for full-stack observability, enabling proactive fault detection.
● Performed root-cause analysis of signaling and bearer path failures by correlating metrics, logs, and packet captures (using tcpdump, Wireshark).
● Monitored KPI dashboards to identify trends in service degradation, proactively initiating corrective actions to prevent outages.
● Tuned QoS parameters and optimized neighbor lists for LTE/4G to improve network accessibility and handover success rates.
● Generated daily/weekly health reports, traffic analytics, and RCA documentation for leadership review.
● Contributed to continuous improvement initiatives by recommending alert threshold refinements and enhancing monitoring logic to reduce false positives.