Summary
Overview
Work History
Education
Skills
Websites
Certification
Languages
Timeline
Generic

TARUN KUMAR GRANDHI

Summary

Experienced Data Engineer with 5+ years of experience in AWS and Azure Cloud services, utilizing Big Data technologies like Databricks/Spark and Hadoop Ecosystems. Proficient in Unified Data Analytics with Databricks, including Databricks Workspace User Interface and managing Databricks Notebooks. Skilled in working with Data Lake using Python and Spark SQL. Strong understanding of Spark Architecture with Databricks and Structured Streaming. Experience in setting up AWS with Databricks for Business Analytics and managing clusters in Databricks. Hands-on expertise in data extraction, transformations, and loads, as well as optimizing and automating Extract, Transform, and Load processes. Proficient in creating and loading data into Hive tables with appropriate partitions for efficiency. Familiarity with Hadoop file formats like Delta, Parquet, ORC & AVRO. Proven ability to optimize Hive SQL queries and Spark Jobs. Knowledgeable in business process analysis and design, re-engineering, cost control, capacity planning, performance measurement, and quality. Skilled in creating technical documents for Functional Requirements, Impact Analysis, and Technical Design. Experienced in delivering highly complex projects using Agile and Scrum methodologies. Diligent Senior Data Engineer with a strong background in data engineering and a proven ability to design and implement complex data pipelines. Contributed to optimizing data architecture and enhancing data processing efficiencies. Demonstrated expertise in big data technologies and proficiency in Python and SQL. Diligent Senior Data Engineer with robust background in data engineering and proven ability to design and implement complex data pipelines. Successfully contributed to optimizing data architecture and enhancing data processing efficiencies. Demonstrated expertise in big data technologies and proficiency in Python and SQL.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

Elevance Health
07.2023 - Current
  • I design and maintain scalable, end-to-end data pipelines on Microsoft Azure Cloud for the healthcare insurance domain, where I store raw and processed data in Azure Data Lake Storage (ADLS) and ingest real-time data using Kafka, ensuring immediate availability for downstream analytics
  • I integrate Azure Data Factory to manage batch processing tasks that complement real-time ingestion, scheduling ETL jobs to transform and load large volumes of historical and transactional data into the analytics layer
  • I leverage Azure Databricks for advanced data transformations and employ Delta Live Tables to automate continuous data ingestion and processing, ensuring efficient cluster management and high-performance Spark jobs tailored to healthcare data requirements
  • I orchestrate the entire pipeline using Apache Airflow, which seamlessly connects real-time streams, batch jobs, and transformation processes, while utilizing Terraform for infrastructure provisioning and Kubernetes to manage containerized microservices, ensuring consistency and scalability across the environment
  • I consolidate transformed data into Azure Synapse Analytics for a robust data warehouse that powers interactive Power BI visualizations, enabling actionable insights for risk assessment, claims management, and regulatory compliance
  • I enforce rigorous data governance, quality, and integrity protocols by integrating monitoring tools such as Grafana alongside native Azure monitoring services, which provide continuous oversight and proactive alerts, while also supporting data science initiatives that focus on model training and predictive analytics accounting for approximately 15% of my role to drive enhanced decision-making and operational efficiency in the insurance landscape

Senior Data Engineer

Payoneer
08.2022 - 06.2023
  • I architect end-to-end data pipelines on AWS using a lakehouse approach, where I leverage Amazon S3 for the data lake, Snowflake for data warehousing, and AWS EMR (with Spark, Hive, and Presto) for transforming raw financial data into actionable insights
  • I utilize Apache Airflow extensively to orchestrate complex ETL workflows scheduling data ingestion from sources such as AWS Kinesis Data Streams and AWS Data Migration Service, coordinating transformations on EMR, and managing the seamless load into Snowflake for high-performance analytics
  • I implement real-time data ingestion and processing strategies by integrating AWS Kinesis for streaming financial transactions and Lambda for serverless data manipulation, ensuring low-latency processing that supports immediate analytics and risk monitoring
  • I enforce robust data governance, security, and quality by leveraging AWS Glue for automated data cataloging, AWS IAM and KMS for secure access and encryption, and AWS CloudTrail along with CloudWatch for comprehensive auditing and monitoring across the entire pipeline
  • I drive infrastructure automation and scalability using AWS CloudFormation and Terraform to provision and manage resources, ensuring that our data pipelines are reproducible, scalable, and aligned with compliance standards critical in the fintech industry
  • I collaborate closely with cross-functional teams including data scientists using Amazon SageMaker for predictive analytics and fraud detection to integrate machine learning workflows within our lakehouse architecture, all while continuously monitoring performance and cost with AWS CloudWatch dashboards and refining processes based on real-time insights
  • Applied loss functions and variance explanation techniques to compare performance metrics.
  • Collaborated with cross-functional teams to define requirements and develop end-to-end solutions for complex data engineering projects.

Data Engineer

Ekincare
10.2019 - 07.2022
    • Worked in the evaluation and documentation of the on-premises data architecture using legacy systems such as Oracle databases, SQL Server environments, Hadoop clusters, and ETL tools like Informatica to design a comprehensive migration strategy to AWS
    • Developed and executed end-to-end migration plans by architecting data pipelines that extracted data from on-prem systems and loaded them into services like AWS S3 for storage, Snowflake for data warehousing, and AWS Glue for transformation workflows
    • Built and optimized transformation and data cleansing processes to prepare healthcare data for analytical processing and reporting, ensuring that quality and integrity were maintained during the migration
    • Designed robust CI/CD pipelines using Jenkins to automate testing, deployment, and integration of the migrated data processes, thereby minimizing downtime and streamlining the transition
    • Leveraged AWS compute services such as EC2 for scalable processing and AWS Lambda for serverless operations, reducing operational overhead and increasing system responsiveness
    • Implemented real-time monitoring and alerting using AWS CloudWatch and AWS CloudTrail, ensuring continuous visibility into system performance, data quality, and security events
    • Created interactive visualization dashboards in Tableau to deliver actionable insights to business stakeholders, enabling data-driven decision-making within the healthcare organization
    • Ensured secure connectivity during migration by configuring AWS Direct Connect and VPN tunnels, which established a high-speed, secure link between the on-prem environment and AWS
    • Managed continuous data replication with AWS Database Migration Service (DMS) to minimize downtime and guarantee data consistency during the transition
    • Conducted comprehensive data profiling and cleansing on legacy data sets using on-prem tools before migration, ensuring that data quality standards were met post-migration
    • Containerized microservices using Docker and orchestrated them with Kubernetes on AWS (EKS) to support scalable, modular data processing components in the new cloud environment
    • Optimized operational costs by implementing auto-scaling for EC2 instances, utilizing reserved instances where applicable, and monitoring expenditures with AWS Cost Explorer
    • Enforced data governance and security best practices by integrating role-based access controls using AWS IAM along with legacy on-prem security protocols, thereby safeguarding sensitive healthcare data

Data Engineer

Flipkart
01.2019 - 10.2019
    • Worked on Data ingestion and acquisition is done through Sqoop from RDBMS and Hive
    • Data transformation and processing such as aggregation, filtering, grouping using Hive for Batch data and Phoenix for online data access on HBase for Realtime data aggregation and reporting as per the reporting needs
    • Worked on performance optimization of hive queries execution and data storage formats for efficient processing
    • I designed and maintained robust data ingestion pipelines using Apache Flume and Kafka to collect and stream diverse e-commerce data (web logs, clickstreams, transactional records) into HDFS, ensuring efficient and reliable data capture from multiple sources
    • I developed and optimized batch processing workflows using both traditional MapReduce and Apache Spark on YARN, which allowed me to handle large-scale data transformations, enabling fast processing of terabytes of customer and sales data
    • I implemented end-to-end ETL processes using Apache Pig scripts and Spark jobs, orchestrated by Apache Oozie, to transform raw data into structured formats suitable for downstream analytics, reporting, and business intelligence
    • I enforced rigorous data governance and quality standards by integrating tools like Apache Griffin for automated data quality monitoring, while also implementing Kerberos and Apache Ranger to manage secure access and ensure data integrity throughout the Hadoop ecosystem
    • I collaborated closely with data scientists and business analysts by utilizing Apache Zeppelin for interactive data analysis and provided processed data to visualization tools like Tableau

Education

Master of Science - Big Data Analytics and Information Technology

University of Central Missouri
Warrensburg, MO
05.2024

Skills

  • ETL development
  • Data integration
  • Continuous integration
  • Data migration
  • Advanced analytics
  • Data pipeline design
  • API development
  • Performance tuning
  • Data modeling
  • Data governance
  • Machine learning
  • Data security
  • Real-time analytics

Certification

  • Microsoft Certified: Azure Data Fundamentals (DP-900)
  • Microsoft Certified: Azure Data Engineer Associate (DP-203)
  • AWS Certified: Developer Associate (DVA-CO2)

Languages

English
Full Professional
Hindi
Limited Working
Telugu
Native or Bilingual
Tamil
Limited Working

Timeline

Senior Data Engineer

Elevance Health
07.2023 - Current

Senior Data Engineer

Payoneer
08.2022 - 06.2023

Data Engineer

Ekincare
10.2019 - 07.2022

Data Engineer

Flipkart
01.2019 - 10.2019
  • Microsoft Certified: Azure Data Fundamentals (DP-900)
  • Microsoft Certified: Azure Data Engineer Associate (DP-203)
  • AWS Certified: Developer Associate (DVA-CO2)

Master of Science - Big Data Analytics and Information Technology

University of Central Missouri
TARUN KUMAR GRANDHI