Summary
Overview
Work History
Education
Skills
Certification
Work Availability
Timeline
Skills
Generic

HARINI REDDY

Farmington Hills,MI

Summary

Data Engineer with 7+ years of experience designing, building, and optimizing scalable data pipelines across AWS, Google Cloud Platform (GCP), and Hadoop ecosystems. Proficient in orchestrating cloud-native ETL workflows using Airflow, AWS Glue, Lambda, Step Functions, Cloud Composer, Dataflow, and DataProc. Hands-on expertise in Big Query, Cloud Storage, Amazon S3, Snowflake and modern data lake architectures. Strong coding skills in Python and SQL, with deep knowledge of Apache Spark, Hive, and HDFS for distributed data processing and analytics. Experienced in production support, incident management, and performance tuning to ensure data reliability at scale. Actively expanding capabilities in Generative AI (GenAI), with hands-on experience integrating LLM-powered automation into cloud-based data workflows to enable intelligent analytics and next-gen data solutions.

Overview

9
9
years of professional experience
1
1
Certificate

Work History

Data Engineer

JPMorgan Chase
01.2024 - Current
  • Designed and automated scalable data pipelines in AWS using Glue, Lambda, and Step Functions to extract data from S3, transform with PySpark, and load into Redshift and Snowflake — improving data processing speed by 40%.
  • Led the migration of critical ETL workflows from on-prem Hadoop to AWS (S3, EMR, Redshift, Snowflake), enabling cloud-native architecture and reducing infrastructure cost by 35%.
  • Built and maintained end-to-end ETL pipelines using Hadoop ecosystem tools like HDFS, Hive, Spark, and Sqoop to ingest and transform large-scale enterprise data.
  • Developed and managed workflow orchestration using Apache Airflow and Control-M to ensure smooth scheduling, retries, and alerting across complex multi-source data pipelines.
  • Built and optimized SQL queries and ETL operations for large-scale enterprise data, enhancing overall pipeline efficiency and reliability.
  • Monitored and maintained Hadoop clusters with Cloudera Manager (including Name Node HA and tuning). Provided 24/7 production support for Informatica, Snowflake, AWS, and Hadoop workflows using tools like Dynatrace, Grafana, and Splunk for proactive monitoring, issue resolution, and root cause analysis.
  • Enforced data security via IAM roles, KMS encryption, and Kerberos authentication.

Environment: AWS (Amazon S3, lambda, Glue, Step Functions, Cloud Watch, Athena, DynamoDB), Hadoop, Cloudera manager, Informatica, Snowflake, Hive, Apache Ambari, SQL, GitHub, python, Bitbucket, shell Scripting, Unix/Linux, Splunk, Jira, ServiceNow.

Cloud Data Engineer

CVS Health
06.2021 - 12.2023
  • Experience in building ETL pipelines on GCP using Dataflow, Cloud Composer (Apache Airflow), and Cloud Functions to ingest, process, and transform large volumes of healthcare data efficiently.
  • Automated data ingestion workflows into FHIR Store from GCS and BigQuery, ensuring near real-time updates and high data availability for clinical and operational analytics.
  • Built robust Apache Airflow pipelines using diverse operators to orchestrate complex data workflows with error handling, retries, and dynamic branching.
  • Developed data validation processes with Apache Beam on Cloud Dataflow to ensure data accuracy and consistency between source systems and BigQuery datasets.
  • Implemented data security and governance by configuring BigQuery authorized views, IAM roles, and encryption to enforce compliance with HIPAA and enterprise standards.
  • Created and maintained dashboards and Looker reports to provide actionable insights for business and clinical stakeholders, accelerating data-driven decision-making.

Environment: GCP(Google cloud storage, Big Query, DataProc, Dataflow, Cloud Composer), SQL, GitHub, Airflow, FHIR Store, HealthCare, Apache Beam, Cloud shell, Python, Looker.

Data Engineer

Walmart
01.2019 - 05.2021
    • Responsible to build the ETL Pipelines (Extract, Transform, Load) from data lake to different databases and reflect it to the frontend. We create and build integration patterns that transform raw data into refined data utilizing tools such as Hadoop, HDFS, Hive, Spark, Python, Sqoop, DB2, SQL Server, GCP.
    • Worked on a migration project to migrate data from Hadoop to GCP buckets and developed Big Query scripts to load data from GCP buckets to Big Query and scheduling the jobs in Cloud Composer.
    • Process and load bound and unbound Data from Google pub/subtopic to BigQuery using cloud Dataflow with Python and build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
    • Worked on Hive Partition and bucketing concepts and created hive External & Internal tables with Hive Partition and responsible to troubleshoot issues related to data pipeline failures or slowness, built using MapReduce, Tez, hive or Spark to ensure SLA adherence.
    • Improving performance and optimizing existing algorithms in Hadoop using Spark context, Spark- SQL, Data Frames, Pair RDD's & Spark YARN.
    • Built real time dashboards to report store level sales and region level sales for Walmart US and global data using Tableau.
    • Worked in Agile environment with JIRA; handled incidents via ServiceNow, joined bridge calls, coordinated with vendors, and ensured SLA compliance.
    • Environment: Hadoop, HDFS, Spark, python, Teradata, Hive, Aorta, Sqoop, API, GCP, Google cloud storage, Big Query, Dataproc, Dataflow, Cloud Composer, Pub sub, SQL, DB2, UDP, GitHub, Tableau, Data Studio, Looker, etc.

Software Engineer

Thought pulse Software Technology
04.2016 - 07.2017
    • Strong experience in Software Configuration Management (SCM) across Agile, Scrum, and Waterfall methodologies, including version control with Git (branching, tagging, merging).
    • Designed and managed Oozie workflows for job scheduling and batch processing in big data environments.
    • Developed and automated data workflows using Shell, Python, and PowerShell, improving deployment and integration across environments.
    • Performed advanced text analytics using Apache Spark with Scala, enhancing data processing through in-memory computing.
    • Collaborated with cross-functional teams to gather requirements, review BRD/Test Plans, and deliver high-quality solutions on time within Agile sprints.
    • Environment: Git, Oozie, Apache Spark, Scala, Python, Shell Scripting (Bash), PowerShell, Hadoop Ecosystem, Cloudera, Ambari, Jenkins, Agile, Scrum, JIRA, Linux, Windows, SQL

Education

Master of Science - Computer Science

University of Illinois At Springfield
Springfield, IL
12.2018

Bachelor of Science - Information Technology

Jawaharlal Nehru Technological University
India
04.2016

Skills

  • Large-Scale Data Processing (Batch & Streaming)
  • Data Management, Data Migration & Integration
  • Documentation & Knowledge Transfer
  • Data pipeline Development & Workflow Orchestration
  • Cloud Data Engineer (AWS & GCP)
  • Production Support & Incident Management
  • Data Warehousing & Data Visualization
  • Generative AI (GenAI) Integration
  • Performance tuning and optimization

Certification

  • AWS Certified Cloud Practitioner
  • Google Cloud Certified - Professional Data Engineer.

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Data Engineer

JPMorgan Chase
01.2024 - Current

Cloud Data Engineer

CVS Health
06.2021 - 12.2023

Data Engineer

Walmart
01.2019 - 05.2021

Software Engineer

Thought pulse Software Technology
04.2016 - 07.2017

Master of Science - Computer Science

University of Illinois At Springfield

Bachelor of Science - Information Technology

Jawaharlal Nehru Technological University

Skills

  • Cloud Platform: AWS (S3, Lambda, Glue, Step Functions, Cloud Watch), GCP (Cloud Storage, Big Query, Dataflow, Data proc, cloud composer), Snowflake.
  • Programming Languages: Python, SQL, scala and Shell Scripting.
  • Big Data Technologies: Apache Spark, Hive, HDFS, YARN, Kafka, Hadoop Ecosystem.
  • Databases: Hive, BigQuery, Snowflake, MySQL, SQL Server, DB2, Teradata, Toad
  • Data Engineering: Airflow, ETL, Big Query, control-M, Redshift, Monitoring & Logging (CloudWatch, Splunk, Datadog, Dynatrace, Grafana) Data Warehousing, Data Analytics, Pandas, NumPy.
  • DevOps / SRE: CI/CD, Git, Version Control, Jenkins, Terraform, Automation, Docker, Kubernetes, AWS (S3, EC2, VPC), Google Cloud (GCS, Data Lake).
  • Data Visualization: Looker, power BI, Tableau.