Summary
Overview
Work History
Education
Skills
Timeline
Generic

Niharika Uppalapati

Waterbury,CT

Summary

  • Over 5 years of experience as a Data Engineer with expertise in building scalable data pipelines, data lakes, and ETL processes using Python, SQL, and Big Data technologies on AWS, Azure, and GCP platforms.
  • Proficient in designing robust data architectures, implementing data integration strategies, and optimizing data workflows to support advanced analytics and data-driven decision-making.
  • Hands-on experience with cloud platforms including AWS (EC2, S3, RDS, Lambda), Azure (Data Factory, Data Lake, Databricks), and GCP (BigQuery, Dataflow).
  • Skilled in CI/CD pipeline development using Jenkins, Docker, Kubernetes, and Terraform to ensure efficient deployment and automation.
  • Strong knowledge in data modeling, data warehousing, and creating data visualizations using Power BI and Tableau.
  • Proven ability to work in Agile and Waterfall environments, delivering high-quality solutions within project timelines.

Overview

7
7
years of professional experience

Work History

Azure data engineer

ESPN
Bristol, USA
07.2024 - Current
  • Designed and managed data warehouses and data lakes using Azure Data Lake, Azure Data Factory, and Blob Storage
  • Developed and optimized ETL pipelines using Python with libraries like Pandas, PySpark, and also Azure Databricks for seamless data integration
  • Built automated data ingestion workflows using Spark, Sqoop, Oozie, and Control-M
  • Implemented real-time data streaming solutions using Kafka and Spark Streaming
  • Deployed Infrastructure as Code (IaC) using Terraform for cloud resource provisioning and scaling
  • Created analytical dashboards and reports using Power BI and Azure Analysis Services
  • Managed CI/CD pipelines in Agile environments using Jenkins, Bitbucket, and Bamboo
  • Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment
  • Developed database triggers and stored procedures using T-SQL cursors and tables
  • Use Python's Unit Testing library to test various programs on Python and other codes

AWS Data Engineer

Nuvance Health
Danbury, USA
08.2023 - 06.2024
  • Developed ETL jobs for data extraction and integration into Redshift data marts
  • Implemented AWS S3 security frameworks using Lambda and DynamoDB
  • Worked on Kafka-based real-time streaming and Spark data processing
  • Automated AWS infrastructure provisioning with Terraform
  • Built data models and visualized reports using Power BI
  • Developing and maintaining AWS Analysis Services models to support business intelligence and data analytics requirements, creating measures, dimensions, and hierarchies for reporting and visualization
  • Extensively worked with Pyspark/Spark SQL for data cleansing and generating Data Frames and RDDs
  • Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level
  • I have used T-SQL for the MS SQL server and ANSI SQL extensively on disparate databases

GCP Data Engineer

Kotak General Insurance
Mumbai, India
05.2020 - 12.2021
  • Developed SQOOP scripts to migrate data from Oracle to Big Data environments, facilitating seamless data integration
  • Created T-SQL scripts to manage Azure SQL Database objects, including tables, indexes, and views, for optimized data storage and retrieval
  • Designed and maintained GCP cloud solutions using Dataproc and BigQuery for scalable data processing and analytics
  • Utilized Spark SQL with Scala and Python interfaces to convert RDD case classes to schema RDDs for advanced data processing
  • Authored custom PySpark UDFs for data manipulation, aggregation, labeling, and cleaning tasks
  • Managed large datasets using Panda's data frames and SQL, ensuring efficient data analysis and reporting
  • Ingested data into Azure Data Lakes using Azure Data Factory, processing it in Databricks for day-to-day business requirements
  • Implemented end-to-end data pipelines using Apache Beam and Cloud Dataflow, validating data between raw source files and BigQuery tables

Data Engineer

UltraTech Cement
Mumbai, India
03.2018 - 04.2020
  • Developed PySpark applications for ETL operations, optimizing data pipelines for efficient data processing and integration
  • Built scalable distributed data solutions using Hadoop, enhancing data storage and retrieval capabilities
  • Designed and developed dashboards and reports using Power BI, enabling data visualization and business intelligence
  • Configured Spark Streaming with Kafka to capture real-time data and store it in DBFS for continuous processing
  • Created Databricks Spark jobs using PySpark for complex table operations and data transformations
  • Processed data from multiple file formats, including XML, CSV, and JSON, using Spark jobs for analytics and reporting
  • Automated AWS infrastructure management using Terraform, achieving high availability and reducing management time by 90%
  • Utilized Azure Data Factory to ingest and process data in Databricks, loading results into Azure Data Lakes
  • Migrated data from on-premises SQL Server to cloud databases, including Azure Synapse Analytics and Azure SQL DB
  • Implemented a CI/CD system using Git, Jenkins, MySQL, and custom Python/Bash tools, automating deployment and integration

Education

Masters - Business Analytics

University of New Haven
05-2024

Skills

  • AWS
  • Azure
  • GCP
  • Spark
  • Hadoop
  • Kafka
  • Snowflake
  • Airflow
  • Data Factory
  • Oracle
  • Teradata
  • MySQL
  • NoSQL
  • HBase
  • MongoDB
  • Python
  • SQL
  • Scala
  • Java
  • Jenkins
  • Docker
  • Kubernetes
  • Terraform
  • Power BI
  • Tableau
  • Agile
  • Waterfall
  • SDLC

Timeline

Azure data engineer

ESPN
07.2024 - Current

AWS Data Engineer

Nuvance Health
08.2023 - 06.2024

GCP Data Engineer

Kotak General Insurance
05.2020 - 12.2021

Data Engineer

UltraTech Cement
03.2018 - 04.2020

Masters - Business Analytics

University of New Haven
Niharika Uppalapati