Summary

Overview

Work History

Education

Skills

Timeline

Naresh Nalla

USA

Summary

Accomplished Data Engineer with expertise in Azure Data Factory and Snowflake, previously at United Health Care. Successfully designed scalable data pipelines, enhancing analytics capabilities and reducing processing time by 60%. Adept at collaborating with cross-functional teams and implementing data governance strategies, driving impactful business insights through advanced data solutions.

Overview

years of professional experience

Work History

Data Engineer

United Health Care

Kingston, NY

07.2023 - Current

Designed and implemented end-to-end data pipelines using Azure Data Factory (ADF), Databricks, and Snowflake, ensuring seamless data ingestion, transformation, and loading (ETL/ELT) from diverse sources.
Optimized large-scale data processing with Apache Spark, Delta Lake, Apache Flink, and DBT, reducing processing time and enhancing analytics capabilities.
Developed real-time streaming pipelines integrating Azure Event Hubs, Azure Functions, Apache Flink, and Snowpipe, enabling low-latency data ingestion into Snowflake.
Built and managed data lakes on Azure Data Lake Storage (ADLS) and Blob Storage, implementing partitioning, compression, and encryption for efficiency.
Engineered scalable DBT transformation pipelines in Snowflake, optimizing ELT workflows with incremental processing, materialized views, and clustering for improved performance.
Designed and deployed fact and dimension models in DBT to enhance analytical reporting and business intelligence insights.
Designed and implemented data models for large-scale data pipelines using ERwin Data Modeler, ensuring data integrity and consistency across ETL/ELT workflows.
Automated CI/CD workflows using Azure DevOps for version-controlled DBT deployments with validation, testing, and monitoring.
Leveraged Databricks Delta Live Tables (DLT) and Snowflake Streams & Tasks to support real-time and batch processing for advanced analytics.
Hands-on experience with Presto, where I utilized it for interactive query processing across heterogeneous data sources, significantly reducing query times and improving data retrieval efficiency.
Implemented data governance and security using Databricks Unity Catalog, Snowflake RBAC, Microsoft Purview, and schema validation tests, ensuring compliance and traceability.
Developed financial reporting and e-commerce analytics pipelines using DBT in Snowflake, reducing data refresh time by 60% and improving decision-making.
Established data replication strategies between Snowflake and other platforms using Change Data Capture (CDC) techniques, ensuring 98% data consistency.
Designed customer 360 data pipelines, integrating sales, marketing, and support data for real-time analytics and actionable insights.
Acted as SME for Microsoft Fabric implementation, supporting post-deployment validation, security configurations, and optimizing platform setup aligned with Azure best practices.
Collaborated with client security teams to assess configuration gaps, perform a Fabric-specific security risk review, and provide hands-on remediation guidance in coordination with Microsoft support.
Led integration efforts for hybrid environments using Microsoft On-Premises Data Gateway and Azure services, including SharePoint and Oracle data sources, for centralized Fabric analytics.
Engineered real-time data enrichment using Apache Flink to join, filter, and transform streaming data from Kafka and Event Hubs, improving real-time insights for business ops.
Built scalable Flink jobs to support fraud detection and alerting pipelines, ensuring minimal latency and high throughput for time-sensitive analytics.
Integrated Microsoft Fabric with Snowflake for unified data engineering, warehousing, and BI, leveraging Power BI, Synapse Analytics, and OneLake for interactive dashboards.
Orchestrated complex Azure data workflows using ADF, Logic Apps, Apache Airflow, and Azure Functions, improving automation and monitoring.
Designed and optimized Snowflake data warehouses, implementing partition pruning, clustering, and materialized views for cost-efficient, high-performance query execution.
Developed log analytics and monitoring pipelines using DBT, Snowflake, Fivetran, Azure Monitor, and Power BI (DAX, M Query), ensuring proactive troubleshooting and performance optimization while enabling interactive and real-time analytics reporting.
Implemented Databricks Unity Catalog to enable centralized data governance, metadata management, and fine-grained access control across multiple workspaces, integrating with Power BI and Tableau for enhanced visualization and reporting.
Designed and developed Data Mesh-based decentralized data platforms using Databricks Delta Lake, Snowflake, Fivetran, and Power BI (Data Modeling, Real-Time Dashboards), enabling domain-driven data ownership and governance while providing actionable insights through live dashboards.
Implemented Fivetran connectors for automated data ingestion from SaaS platforms (e.g., Salesforce, Netsuite), reducing manual ETL development effort by 70%.
Configured and monitored Fivetran sync schedules and transformations to ensure near real-time updates and high data reliability across BI and analytics layers.
Established data contracts and schema enforcement policies to maintain data consistency and interoperability across different teams and business units, leveraging Power BI and Tableau Prep for streamlined data transformation and visualization.
Automated governance rule enforcement using Databricks Workflows, Snowflake Stored Procedures, and integrated Power BI for ongoing monitoring and compliance reporting.
Developed advanced Power BI visualizations and reports with real-time data integration, improving decision-making processes and overall data-driven strategy.
Optimized Power BI performance by utilizing DAX, M Query, and custom Data Models to handle large-scale data sets efficiently for interactive dashboards and reports.
Environment: Azure Data Factory, Azure Databricks, Snowflake data warehouse, Azure Event Hubs, Azure Functions, Azure Data Lake Storage, Azure Blob Storage, Azure Logic Apps, Power BI (DAX, M Query, Data Modeling, Real-Time Dashboards), Tableau (Prep, Dashboard Design, Advanced Visualizations), Presto, Apache Flink, Fivetran, Azure Machine Learning, Azure Monitor, Power BI, Azure Analysis Services, Purview, Apache Airflow, Apache Atlas, Microsoft Fabric, OneLake, Synapse Analytics, Synapse Real-Time Analytics.

Data Engineer (Azure, Snowflake)

Cloud wave Inc.

Washington, DC

09.2020 - 06.2023

Designed and implemented scalable data ingestion pipelines using Azure Data Factory and Snowflake’s Snowpipe, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs.
Developed robust data processing workflows leveraging Azure Databricks and Spark for distributed data processing and transformation tasks, integrating with Snowflake for seamless data loading and querying.
Ensured data quality and integrity through comprehensive data validation, cleansing, and transformation operations performed using Azure Data Factory, Databricks, and Snowflake’s native capabilities like streams and tasks.
Leveraged Azure Synapse Analytics and Snowflake to seamlessly integrate big data processing and analytics capabilities, empowering data exploration and insights generation across cloud platforms.
Automated data pipelines and workflows by configuring event-based triggers and scheduling mechanisms in Azure Data Factory and Snowflake Tasks, streamlining data processing and delivery, which resulted in a 48% reduction in manual intervention.
Implemented comprehensive data lineage and metadata management solutions using Azure Purview and Snowflake’s INFORMATION_SCHEMA views, ensuring end-to-end visibility and governance over data flow and transformations.
Identified and resolved bottlenecks within data processing and storage layers, optimizing query execution in both Spark and Snowflake, thereby reducing data latency and enhancing overall performance.
Enforced advanced techniques such as partitioning, clustering keys, indexing, and result caching in Snowflake and Azure services to enhance query performance and reduce processing time.
Conducted meticulous performance tuning and capacity planning exercises on Snowflake’s virtual warehouses and Azure compute resources, ensuring scalability and maximizing efficiency within the data infrastructure.
Demonstrated proficiency in scripting languages like Python and Scala, enabling efficient data manipulation and integration of custom functionalities across Databricks and Snowflake environments.
Developed and fine-tuned high-performance Spark jobs to handle complex data transformations, aggregations, and machine learning tasks on large-scale datasets, storing results in Snowflake for downstream analytics.
Developed end-to-end data pipelines using Kafka, Spark, and Hive, integrating with Snowflake for unified data storage and query access, enabling seamless data ingestion, transformation, and analysis.
Orchestrated complex ETL workflows using Apache Airflow, ensuring efficient scheduling, monitoring, and automation of data pipelines. Developed and maintained automated workflows using Automic for job scheduling and monitoring, improving operational efficiency by 15%.
Leveraged Kafka and Spark Streaming to process and analyze streaming data, contributing to real-time data processing and insights generation in Snowflake, improving real-time analytics capabilities by 30%.
Utilized Spark core and Spark SQL scripts using Scala to expedite data processing and enhance performance, with Snowflake as a target destination for high-performance analytics workloads.
Architected and implemented a cloud-based data warehousing solution utilizing Snowflake on Azure, harnessing its exceptional scalability, elasticity, and native support for semi-structured data.
Created and optimized Snowflake schemas, tables, views, and materialized views to facilitate efficient data storage, deduplication, and retrieval, catering to advanced analytics and reporting requirements.
Collaborated closely with data analysts and business stakeholders to deeply understand their needs and implement well-aligned data models and structures within Snowflake, ensuring self-service analytics capabilities.
Executed Hive scripts through Hive on Spark and SparkSQL, effectively supporting ETL tasks, maintaining data integrity, and ensuring pipeline stability, with downstream data warehousing in Snowflake.
Proficiently worked within Agile methodologies, actively participating in daily stand-ups and coordinated planning sessions, delivering Snowflake-based solutions iteratively and collaboratively.
Environment: Azure (Data Factory, Databricks, Synapse Analytics, Logic Apps, Function Apps, Purview), Snowflake (Snowpipe, Tasks, Streams, Clustering), Spark (Core, SQL, Streaming), Hive, Kafka, Oracle, HDFS, MapReduce, YARN, Python, Scala, PySpark, SQL, Apache Airflow, Automic, Jenkins, Power BI.

Big Data Developer

Kaiser Permanente

Atlanta, GA

06.2018 - 08.2020

Designed and implemented a scalable ETL framework using Sqoop, Pig, Presto and Hive to efficiently extract, transform, and load data from various sources, ensuring seamless data availability for consumption.
Processed data stored in Hadoop Distributed File System (HDFS), leveraging Hive to create external tables and developing reusable scripts for efficient table ingestion and repair across the project.
Developed robust ETL jobs using Spark, Presto and Scala to migrate data from Oracle to new MySQL tables, ensuring smooth data transfer and maintaining data integrity.
Leveraged the powerful capabilities of Spark, including RDDs, DataFrames, and Spark SQL, along with SparkCassandra Connector APIs, for diverse data tasks such as data migration and generating comprehensive business reports.
Engineered a high-performance Spark Streaming application for real-time sales analytics, enabling timely insights and decision-making.
Conducted comprehensive analysis of source data, effectively handled data type modifications, and utilized Excel sheets, flat files, and CSV files to generate on-demand Power BI reports.
Analysed SQL scripts and devised optimal solutions using PySpark, ensuring efficient data processing and transformation.
Leveraged Sqoop to efficiently extract data from multiple data sources into HDFS, facilitating seamless data integration.
Orchestrated data imports from various sources, executed transformations using Hive and MapReduce, and loaded processed data into HDFS.
Successfully extracted data from MySQL databases into HDFS using Sqoop, enabling seamless data transfer and integration.
Implemented streamlined automation for deployments using YAML scripts, resulting in accelerated and efficient build and release processes.
Expertly utilized Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop, leveraging their capabilities to optimize data processing and management.
Developed data classification algorithms using MapReduce design patterns, enhancing data processing efficiency and accuracy.
Employed advanced techniques, including combiners, partitioning, and distributed cache, to optimize the performance of MapReduce jobs.
Effectively utilized Git and GitHub repositories for comprehensive source code management and version control, fostering efficient collaboration and ensuring traceability of code changes.
Environment: Sqoop, Pig, HDFS, Power BI, GitHub, Apache Cassandra, Presto, ZooKeeper, Flume, Kafka, Apache Spark, Scala, Hive, Hadoop, Cloudera, HBase, MySQL, YAML, JIRA, Git, GitHub.

Data Warehouse Developer

Chevin Fleet Solutions

Fitchburg, MA

07.2015 - 05.2018

Conducted comprehensive requirement analysis to identify data extraction needs from various source systems, including Netezza, DB2, Oracle, and flat files, for seamless integration into the Salesforce application.
Designed and developed robust ETL processes using Informatica PowerCenter to efficiently extract data from diverse sources and load it into the target data warehouse.
Implemented advanced performance tuning techniques to optimize data mappings and address bottlenecks in the data transfer process, resulting in improved efficiency and faster data processing.
Utilized Informatica PowerCenter Tools, such as Designer, Workflow Manager, Workflow Monitor, and Repository Manager, to streamline the development, monitoring, and management of ETL workflows, ensuring smooth execution and enhanced productivity.
Created intricate data mappings from scratch, leveraging a wide range of Informatica Designer Tools, including Source Qualifier, Aggregate, Lookup, Expression, Normalizer, Filter, Router, Rank, Sequence Generator, Update Strategy, and Joiner transformations, to ensure accurate data transformation and seamless integration.
Implemented efficient Incremental Loading mappings using Mapping Variables and Parameter Files, enabling incremental data transfer and optimizing the overall ETL process for efficient data synchronization.
Developed reusable Transformations and Mapplets to promote code reusability, reduce development efforts, and enhance the maintainability of the ETL workflows.
Identified and resolved performance bottlenecks by leveraging the capabilities of the Netezza Database, optimizing Index Cache and Data Cache, and utilizing Rank, Lookup, Joiner, and Aggregator transformations for efficient data processing.
Created and executed Netezza SQL scripts to ensure accurate table loading and developed SQL scripts for validating row counts and verifying data integrity, ensuring data accuracy and reliability.
Conducted comprehensive debugging and troubleshooting of Informatica Sessions using the Debugger and Workflow Monitor, enabling timely issue resolution and ensuring the smooth execution of ETL workflows.
Utilized Session Logs and Workflow Logs for effective error handling and troubleshooting in the development (DEV) environment, ensuring the stability and integrity of the ETL processes.
Prepared detailed ETL design documents and Unit Test plans for Mappings, ensuring comprehensive documentation and adherence to rigorous testing procedures to deliver high-quality solutions.
Compiled meticulous code migration documents and collaborated closely with the release team to facilitate the seamless migration of Informatica Objects and Unix Scripts across development, test, and production environments, ensuring successful deployment and minimizing downtime.
Successfully deployed ETL component code into multiple environments, strictly following the necessary approvals and adhering to established release procedures, ensuring seamless integration and minimizing disruption.
Provided dedicated production support by executing sessions, diagnosing problems, and making necessary adjustments to mappings based on changes in business logic, ensuring the uninterrupted flow of data and smooth operation of the ETL workflows.
Conducted rigorous Unit testing and Integration testing of mappings and workflows to validate their functionality and reliability, ensuring the accuracy and integrity of data throughout the ETL process.
Ensured strict adherence to client security policies and obtained all required approvals for code migration between environments, safeguarding data privacy and maintaining compliance with regulatory standards.
Actively participated in daily status calls with internal teams and provided comprehensive weekly updates to clients through detailed status reports, fostering effective communication, transparency, and project alignment.
Environment: Informatica Power Center 9.5/9.5.1 Repository Manager, Designer, Workflow Manager, Workflow Monitor, Repository Administration Console, Netezza, Oracle Developer, Oracle 11g, SQL Server 2016, T-SQL, TOAD, UNIX, HP Quality Center, Autosys, MS Office Suite.

Data Warehouse Developer

Wellington Management

Boston, MA

01.2014 - 06.2015

Creation, manipulation and supporting the SQL Server databases.
Involved in the Data modeling, Physical and Logical Design of Database.
Contributed to the front end's integration with the SQL Server backend.
Created Stored Procedures, Triggers, Indexes, User-defined Functions, Constraints etc on various database objects to obtain the required results.
Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS).
Wrote T-SQL statements for retrieval of data and was involved in performance tuning of TSQL queries.
Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc to SQL Server using SSIS/DTS using various features like data conversion etc. In addition, Created derived columns from the present columns for the given requirements.
Involved in the design of ETL (Extract, Transform, Load) processes and building dimensional models (star schema, snowflake schema) in ERwin for Data Warehouse solutions.
Designed OLAP cubes and optimized data flows in ERwin to meet business intelligence needs.
Supported team in resolving T-SQL and SQL Reporting Services-related issues, and Proficiency in designing and formatting a variety of reports, including Cross-Tab, Conditional, Drill-down, Top N, Summary, and Sub reports.
Performed routine maintenance procedures, such as backups, index rebuilds, and statistics updates, to maintain the data warehouse's health and performance.
Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.
Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential Quality Stage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts.

Education

Master of Science - Data Science

Georgia Institute of Technology

Atlanta, GA

05.2016

Bachelor of Science - Computer Science

University of Georgia

Athens, GA

05.2014

High School Diploma -

Eagles Landing High School

McDonough, GA

Skills

Azure Services
Azure Data Factory
Event Hubs
IoT Hub
Databricks
Blob Storage
Data Lake Storage
SQL Database
Synapse Analytics
Stream Analytics
Microsoft Fabric
Power BI
Analysis Services
Key Vault
Active Directory
Erwin
Monitor
Blob Storage Tiers
Purview
Data Catalog
Azure Policy
Big Data Technologies
MapReduce
Hive
Teg
Python
PySpark
Scala
Presto
Snowflake
Kafka
Spark streaming
Oozie
Sqoop
Zookeeper
Apache Airflow
DBT
Flink
Fivetran
Hadoop Distribution
Cloudera
Horton Works
Languages
SQL
PL/SQL
HiveQL

Web Technologies
HTML
CSS
JavaScript
XML
JSP
Restful
SOAP
Operating Systems
Windows
UNIX
LINUX
UBUNTU
CENTOS
Build Automation tools
Ant
Maven
Version Control
GIT
GitHub
IDE & Build Tools
Design
Eclipse
Visual Studio
Jupyter Notebook
Business Intelligence & Visualization Tools
DAX
M Query
Data Modeling
Real-Time Dashboards
Tableau
Tableau Prep
Dashboard Design
Advanced Visualizations
Excel
Pivot Tables
Advanced Formulas
Data Analysis
Databases
MS SQL Server
Azure SQL DB
Azure Synapse
MS Excel
MS Access
Oracle
Cosmos DB

<Enter your own>

Title: Senior Azure Data Engineer

Timeline

Data Engineer

United Health Care

07.2023 - Current

Data Engineer (Azure, Snowflake)

Cloud wave Inc.

09.2020 - 06.2023

Big Data Developer

Kaiser Permanente

06.2018 - 08.2020

Data Warehouse Developer

Chevin Fleet Solutions

07.2015 - 05.2018

Data Warehouse Developer

Wellington Management

01.2014 - 06.2015

Master of Science - Data Science

Georgia Institute of Technology

Bachelor of Science - Computer Science

University of Georgia

High School Diploma -

Eagles Landing High School

Naresh Nalla

Summary

Overview

Work History

Data Engineer

Data Engineer (Azure, Snowflake)

Big Data Developer

Data Warehouse Developer

Data Warehouse Developer

Education

Master of Science - Data Science

Bachelor of Science - Computer Science

High School Diploma -

Skills

<Enter your own>

Timeline

Data Engineer

Data Engineer (Azure, Snowflake)

Big Data Developer

Data Warehouse Developer

Data Warehouse Developer

Master of Science - Data Science

Bachelor of Science - Computer Science

High School Diploma -

Similar Profiles

JESSICA SARANJESSICA SARAN

Kanha BaidyaKanha Baidya

KEVIN BANKSKEVIN BANKS

Nereida VargasNereida Vargas

Qael H YarekQael H Yarek