Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Neha Gatla

Kansas City,MO

Summary

Dynamic Data Engineer with a proven track record at PwC, specializing in AWS Glue and Spark to optimize data pipelines, enhancing operational efficiency by 20%. Adept at automating workflows and implementing CI/CD processes, I excel in collaborative environments, driving impactful data solutions that support strategic decision-making.

Overview

7
7
years of professional experience
2
2
Certifications

Work History

Data Engineer

PWC
Kansas, USA
02.2024 - Current
  • Responsible for designing, developing, and automating data pipelines to support the firm's diverse financial services operations.
  • Designed and developed AWS Glue workflows and Step Functions for efficient ETL operations across Amazon RDS, S3, and Redshift.
  • Implemented data lineage tracking in AWS Glue Data Catalog, ensuring auditability and compliance.
  • Optimized data ingestion by integrating RDBMS sources (Oracle, Teradata) with NoSQL databases (DynamoDB) using Boto3.
  • Automated workflows using Apache Airflow on Amazon MWAA, managing DAGs across EMR and EC2 clusters.
  • Deployed real-time data streaming solutions with Apache Kafka and Amazon Kinesis, improving event-driven processing.
  • Managed containerized applications with Amazon EKS and AWS Fargate, ensuring high availability and scalability.
  • Built and maintained CI/CD pipelines (AWS CodePipeline, CodeBuild, CodeDeploy) using Terraform, Jenkins, and Ansible.
  • Enhanced query performance with Spark SQL on AWS Glue and EMR, improving execution times.
  • Developed DBT (Data Build Tool) models for AWS Redshift and Glue, incorporating SQL transformations and data quality checks.
  • Designed and deployed Python-based APIs and microservices with Amazon ECS/Fargate for backend integration.
  • Conducted thorough REST API and SOAP testing using Pytest, UnitTest, and Postman.
  • Developed Linux shell scripts for secure database connectivity and optimized parallel query execution.
  • PwC is a global professional services firm offering audit, tax, consulting, and advisory services to help businesses address challenges and drive growth.
  • Environment: AWS Glue, Redshift, S3, DynamoDB, EMR, Kinesis, Athena, QuickSight, Spark SQL, Apache Kafka, Amazon EKS, Fargate, Docker, Jenkins, Python, Scala, Terraform, Ansible, MongoDB, Oracle, Teradata, Linux.

Data Engineer

CareVet
Missouri, USA
07.2023 - 01.2024
  • Developed data solutions to enhance operational efficiency and decision-making.
  • Built scalable data pipelines with Azure Synapse Analytics and Azure Data Explorer, enabling cross-database queries.
  • Developed Azure Functions and Databricks (PySpark, Spark) for automated raw data ingestion into Azure Data Lake Storage (ADLS).
  • Implemented real-time data processing using Azure Event Hubs and Azure Service Bus.
  • Deployed containerized applications on Azure Kubernetes Service (AKS), increasing data processing efficiency by 20%.
  • Designed ETL pipelines using Azure Data Factory (ADF) and Python for seamless data ingestion and transformation.
  • Applied Informatica Data Quality (IDQ) for data profiling, cleansing, and validation before ingestion into Azure Synapse Analytics.
  • Automated infrastructure provisioning with Azure CLI, PowerShell scripts, and ARM templates.
  • Implemented Azure Stream Analytics for real-time transformations and data storage in Azure Data Lake.
  • Developed machine learning models (Azure ML, Python) for predictive analytics and data-driven decision-making.
  • Integrated external data sources into the data lake using Beautiful Soup-based web scraping.
  • Created advanced Excel dashboards by integrating data from Azure Synapse Analytics and SQL Database, leveraging Power Query and automation.
  • Leveraged Jupyter Notebook for Exploratory Data Analysis (EDA), machine learning model development, and visualization.
  • CareVet is a network of veterinary hospitals dedicated to providing high-quality care and services to improve health.
  • Environment: Azure Synapse Analytics, Data Factory, Data Explorer, Cosmos DB, AKS, Azure ML, Event Hubs, Databricks, ARM Templates, PowerShell, Python, SQLAlchemy, Informatica Data Quality (IDQ), Jupyter Notebook, Excel.

Application Developer/Data Engineer

Roche
Hyderabad, India
06.2021 - 11.2022
  • Contributed to data integration, workflow automation, and analytics initiatives to enhance data quality and deliver insights for personalized healthcare and risk management.
  • Built Big Data pipelines using Hadoop, PySpark, Kafka, MapReduce, and Storm, enhancing large-scale data integration.
  • Migrated on-prem SQL Server data to AWS Cloud using AWS Glue, AWS Data Pipeline, S3, and Redshift.
  • Automated ETL processes using Sqoop for data transfer from Oracle and Teradata into HDFS.
  • Developed PostgreSQL stored procedures, indexing strategies, and query optimizations for enhanced analytics workflows.
  • Automated data modifications, account management, and trading operations with Python-based XML SOAP handlers and SQLAlchemy.
  • Designed test automation frameworks using Selenium and JIRA for functional and regression testing.
  • Created dynamic dashboards and reports with Tableau, Python, and Google Analytics, improving real-time business intelligence.
  • Optimized Spark EMR clusters by tuning Spark SQL queries and adjusting cluster resources.
  • Implemented OLAP cubes in AWS Redshift for efficient multidimensional analysis.
  • Developed data automation workflows with AWS Glue and Alteryx, streamlining data transformation.
  • Applied IAM roles and policies in AWS IAM to manage secure access for data pipelines in S3, EMR, and Redshift.
  • Collaborated with cross-functional teams to implement workflow automation strategies using Python, JIRA, and Selenium.
  • Roche is a global healthcare company focused on discovering new medicines and diagnostics while leveraging data-driven insights to improve healthcare practices.
  • Environment: Hadoop, Spark, PySpark, Kafka, Sqoop, MapReduce, Oracle, Teradata, AWS Glue, AWS Data Pipeline, S3, AWS Redshift, EMR, IAM, OLAP, Tableau, Python, Java, Selenium, Alteryx, PostgreSQL.

Data Engineer

Hindustan Unilever
Hyderabad, India
07.2019 - 05.2021
  • Optimized data warehouse performance and implemented automated CI/CD for AWS deployments.
  • Optimized Snowflake data warehouse performance, improving query execution times and reducing operational costs.
  • Developed a fully automated CI/CD system using Git, Jenkins, MySQL, and custom Python and Bash scripts.
  • Designed and built scalable data pipelines for ingestion, transformation, and analysis of large datasets.
  • Managed SAP Basis operations, including installing SAP components, monitoring system performance, and troubleshooting technical issues to ensure seamless enterprise data processing.
  • Automated data extraction from MySQL using Python for customer usage reports.
  • Deployed code artifacts into Azure environments and managed infrastructure.
  • Spearheaded HBase setup and utilized Spark/SparkSQL, reducing data processing time by 60% and improving accuracy.
  • Configured Kafka clusters, including partitioning and replication factors.
  • Processed large datasets using Hive and Hadoop Distributed File System (HDFS).
  • Hindustan Unilever is a leading consumer goods company in India, offering a diverse range of brands across home care, personal care, and food categories.
  • Environment: SQL Server, Informatica Cloud, Talend, Jenkins, Azure (Azure Virtual Machines, Azure Blob Storage, Azure SQL Database, Azure Functions), MySQL, Python, Bash, Spark, Snowflake, Kafka, Hive, Hadoop.

Education

Master’s - electrical and computer Engineering

University of Missouri (UMKC)
Kansas City, MO

Skills

  • HDFS
  • MapReduce
  • Spark
  • Kafka
  • Storm
  • Hive
  • Sqoop
  • Presto
  • Snowflake
  • AWS Redshift
  • Azure Synapse
  • Oracle
  • MySQL
  • MongoDB
  • Cassandra
  • HBase
  • OLAP
  • Python
  • SQL
  • Java
  • Scala
  • XML
  • AWS S3
  • AWS Lambda
  • AWS Glue
  • AWS EMR
  • AWS RDS
  • Azure Data Factory
  • Azure Synapse Analytics
  • Azure Blob Storage
  • Azure ADLS
  • Azure Databricks
  • Informatica
  • Talend
  • SSIS
  • Alteryx
  • SAS
  • Apache Airflow
  • Kubernetes
  • Terraform
  • Ansible
  • Jenkins
  • Git
  • GitHub
  • GitLab
  • Bitbucket
  • REST APIs
  • SOAP APIs
  • Spark Streaming
  • Power BI
  • Tableau
  • Looker
  • Microsoft Fabric
  • Selenium
  • Pytest
  • Postman
  • JIRA
  • Confluence
  • Scrum
  • Apache Flink
  • AWS SageMaker
  • Azure ML

Certification

• AWS Certified Data Engineer – Associate

Timeline

Data Engineer

PWC
02.2024 - Current

Data Engineer

CareVet
07.2023 - 01.2024

Application Developer/Data Engineer

Roche
06.2021 - 11.2022

Data Engineer

Hindustan Unilever
07.2019 - 05.2021

Master’s - electrical and computer Engineering

University of Missouri (UMKC)
Neha Gatla