Seasoned and cloud-savvy Big Data Engineer and Cloud Data Developer with over 13 years of experience in building scalable, cloud-native data platforms using Snowflake, Databricks, and Apache Spark. Proven ability to deliver high-performance data pipelines, data lakehouses, and ETL/ELT solutions on both AWS and Azure. Experienced in working across multiple industries, delivering actionable insights from massive datasets through modern big data ecosystems, and enterprise-grade data platforms.
Overview
15
15
years of professional experience
Work History
Senior Cloud Engineer
Verizon
Allentown, PA
02.2023 - Current
Engineered real-time streaming ingestion pipelines utilizing Google Pub/Sub and Cloud Functions for efficient data loading into BigQuery with minimal latency.
Engineered declarative ELT pipelines employing Google Dataform for streamlined data transformation and schema management in BigQuery.
Developed reusable SQL data models and automated job scheduling utilizing Cloud Composer (Airflow) with environment-based deployments.
Managed GCS-based data lake and integrated it with BigQuery using federated queries and ingestion workflows.
Implemented robust data governance practices with table partitioning, clustering, access control policies, and data retention rules.
Developed and maintained ETL jobs using Python and SQL to transform data from GCS to BigQuery.
Integrated third-party APIs, flat files, and internal app data into the analytics ecosystem using batch and streaming approaches.
Worked with data scientists and analysts to build and support Looker dashboards on top of BigQuery datasets.
Developed data pipelines using Azure Data Factory and custom Python scripts to load data into Snowflake.
Implemented security policies in Snowflake, including RBAC, object-level permissions, and data masking.
Created dynamic dashboards with Power BI, connected to Snowflake using DirectQuery mode.
Built custom web-based analytics tools using Python and JavaScript for real-time visualization.
Developed monitoring dashboards with Cloudera Manager and Grafana to visualize cluster health metrics.
Designed and implemented Elasticsearch clusters, both on-premises and in the cloud.
Set up Kibana dashboards for real-time data visualization.
Cloud Engineer
Athena
Allentown, PA
03.2021 - 01.2023
Designed and maintained highly available EMR clusters on AWS to support real-time and batch processing.
Automated infrastructure provisioning using Terraform and Ansible, reducing deployment time by 70%.
Led migration of on-prem Hadoop cluster to AWS cloud-based EMR environment.
Implemented centralized logging using ELK and integrated monitoring dashboards with Grafana.
Supported Spark and Kafka workloads for streaming data pipelines.
Built CI/CD pipelines using Jenkins and GitHub Actions for data platform releases.
Big Data Administrator
Reliance Jio
Bangalore, IND
01.2018 - 12.2020
Designed and integrated scalable, secure Azure cloud infrastructure solutions.
Directed a complex, multi-year initiative to transition legacy on-premise systems to Azure.
Designed and maintained Azure Virtual Machines, Blob Storage, Networks, and SQL Databases.
Spearheaded the implementation of Azure Security Center, Azure Active Directory (AAD), RBAC, and Azure Firewall for regulatory compliance with GDPR and HIPAA.
Automated application deployment and infrastructure provisioning with Terraform, ARM Templates, and Azure Automation.
Conducted an analysis of cloud usage using Azure Cost Management and Azure Advisor to identify cost-saving opportunities.
Implemented Azure Stack hybrid cloud solutions for seamless on-premises infrastructure and cloud integration.
Enhanced Azure SQL Database and Cosmos DB performance through implementing indexing, partitioning, and caching strategies.
Implemented strategies ensuring disaster resilience with robust, Azure-based recovery methods.
Ensured cloud-based systems' adherence to security policies and compliance standards in collaboration.
Automated provisioning and configuration management of Kubernetes clusters using Terraform and Helm.
Enhanced the efficiency of Kubernetes clusters through resource allocation optimization.
Enhanced security through RBAC, network policies, and HashiCorp Vault integration for centralized secrets management, ensuring secure and compliant systems.
Lead the deployment and management of a large-scale Cloudera Hadoop ecosystem (over 1,500 nodes) to support data ingestion, processing, and analytics across multiple departments.
Big Data Administrator
Etisalat Telecommunication
India
11.2016 - 06.2017
Working on Gigabytes of data, multi cluster environment almost 50 + nodes
Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc.;
Developing Scripts and Batch Job to schedule various Hadoop Program
Installed the latest version(7.1.7) of CDH and migrated data and metadata from HDP to CDH
Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
Responsible for writing Hive queries for data analysis to meet the business requirements
Prepared automated scripts to collect metrics from Yarn, GC, NameNodes, Data Nodes, HDFS, etc
Providing on-call support for daily run jobs based on ticketing systems across the clusters
Good experience with data pipeline and turning the large volume of data into business value
Good understanding and working experience of distributed processing framework
Worked on creating the automatic job flows using Crontab and MySQL
Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
Monitoring, Performance tuning of Hadoop clusters
Performing Distributed copy between clusters
Creating Snapshot policies and Recovery from node failures
Working on Petabytes of data, multi cluster environment almost 100 + nodes
Installed the latest version (7.1.7) of CDH and migrated data and metadata from HDP to CDH
Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc
Developing Scripts and Batch Job to schedule various Hadoop Program
Responsible for writing Hive queries for data analysis to meet the business requirements
Prepared automated scripts to collect metrics from Yarn, GC, Name Nodes, Data Nodes, HDFS, etc
Providing on-call support for daily run jobs based on ticketing systems across the clusters
Installed Apache atlas(2.0.0), Apache tomcat(10.0.16), Apache impala(3.0.0), Apache Solr(8.11.1), Apache ranger(2.1.0), Apache storm(2.0.0), Anaconda (2020.11) and python (3.6) packages additionally installed NoSQL databases like MongoDb, Cassandra and PostGre from end to end on bare metal servers as well as cloud
Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
Monitoring, Performance tuning of Hadoop clusters
Performing Distributed copy between clusters
Creating Snapshot policies and Recovery from node failures
Performed Commissioning and Decommissioning of nodes and running balancer utility
Managing and reviewing Hadoop log files
Setting up Kerberos and troubling the TGT related issues
Setting up SSL/TLS and renewing the certificates
Setting up installing, configuring, Hadoop Security using Ranger
Resolving the incidents generated by Zendesk and users within SLA
Preparing monthly incidents reports and identifying the reoccurring issues
Automating, manual process which takes time and human efforts