Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Saikiran Reddy Durgareddygari

Senior Azure Data Engineer

Summary

Seasoned Azure Data Engineer with 11+ years of experience designing and implementing scalable, secure, and high-performance data solutions in cloud and hybrid environments. Specializes in Azure Data Factory, Databricks, Synapse Analytics, ADLS Gen2, and Delta Lake. Proficient in PySpark, SQL, and Python for building robust ETL/ELT pipelines and real-time analytics workflows. Skilled in Medallion architecture implementation, DataOps/MLOps pipelines, and CI/CD automation using Azure DevOps, Jenkins, and Terraform. Experienced in integrating solutions with platforms like Power BI, Tableau, Snowflake, Cosmos DB, MongoDB, PostgreSQL, and Oracle Cloud. Ensures data governance and metadata management through Microsoft Purview and dbt. Expertise in event-driven architectures, streaming ingestion with Event Hubs and Azure Stream Analytics, supporting ML models in Azure ML, and developing scalable APIs secured via OAuth2 and AAD B2C. Proficient in Looker, GCP Dataflow, Airflow, DataStage, and OIC integrations for enterprise-wide interoperability. Strong background in designing and maintaining enterprise data warehouses with star/snowflake schema modeling techniques and performance tuning strategies. Recognized for collaborative Agile delivery approach and delivering actionable insights through dynamic dashboards and executive reporting.

Overview

12
12
years of professional experience
4026
4026
years of post-secondary education
3
3
Certifications

Work History

Senior Azure Data Engineer

Morgan Stanley
Atlanta, GA
10.2021 - Current
  • Implemented advanced Azure Data Factory (ADF) features, including data flows, mapping data flows, triggers, parallel processing, batch execution, and dependency management, to optimize ETL pipeline performance, scalability with data lineage, data quality, and metadata management.
  • Mastered Azure Data Factory (ADF) activities including Lookups, Stored Procedures, Conditional Activities (IF condition), Iterative processing (ForEach), Variable management (Set Variable, Append Variable), Metadata retrieval (Get Metadata), and Data filtering, ensuring robust data orchestration and ETL workflow automation.
  • Orchestrated ETL data workflows using Azure Data Factory (ADF) and Copy Activity to ensure seamless data migration and movement between Azure Blob Storage, Azure Data Lake Storage (ADLS GEN2), optimizing data integration processes for enhanced operational efficiency.
  • Led the migration of ETL jobs and workflows from older versions (v9.1) to newer versions (v11.5), ensuring seamless transitions, backward compatibility, and maintaining data integrity during the upgrade process.
  • Executed Auto Loader pipelines in Azure Databricks notebooks using PySpark, optimizing batch processing and implementing structured streaming for continuous data ingestion. Leveraged file notifications to seamlessly handle historical and incremental updates in ETL processes.
  • Developed modular data transformation pipelines using dbt, enabling version-controlled, test-driven SQL workflows for building curated datasets from raw layers in cloud data warehouses, ensuring data consistency and reusability.
  • Designed dbt models following medallion architecture principles (bronze → silver → gold), enabling clear transformation stages, incremental logic using is_incremental(), and dependency resolution through ref() functions to ensure maintainability and performance.
  • Optimized One Lake architecture to enhance data ingestion, transformation, and retrieval, ensuring robust performance and scalability.
  • Utilized Stackdriver (Cloud Monitoring & Logging) to monitor pipeline health, track SLAs, and proactively address failures or latencies.
  • Architected a multi-cluster Azure Databricks environment for isolated development, testing, and production workloads, implementing PySpark-based data quality checks across all stages.
  • Developed and optimized data processing workflows in Azure Databricks using PySpark, implementing Spark DataFrames and Datasets for complex transformations, aggregations, and joins. Created custom PySpark functions and UDFs, significantly improving data processing efficiency.
  • Created Python-based data transformation scripts with PySpark and Beam SDK for complex data cleansing, enrichment, and schema standardization.
  • Automated GCP IAM policy audits and access reviews using Python and GCP SDK to ensure continuous compliance with enterprise security standards.
  • Integrated Google Cloud Build and Artifact Registry to implement CI/CD pipelines for PySpark jobs and Airflow DAGs.
  • Integrated data from heterogeneous sources such as relational databases, flat files, and cloud platforms using DataStage's connectors and stages, enhancing the overall data pipeline architecture and minimizing data discrepancies.
  • Developed a custom delta lake implementation on Azure Data Lake Storage (ADLS Gen2) using PySpark in Azure Databricks, enabling ACID transactions and time travel capabilities for critical datasets.
  • Demonstrated excellent experience in Azure Logic Apps workflows with conditional statements, loops, and error handling, integrating with Azure Functional Apps for custom code and data transformations, improving data quality.
  • Designed a scalable data engineering pipeline on Azure Databricks, leveraging Apache Spark 3.0 for advanced data processing, Delta Lake for data reliability, and Optimized Auto Scaling, achieving 10x faster data processing.
  • Implemented Azure Purview to establish comprehensive data governance across hybrid and multi-cloud environments, encompassing data cataloging, lineage tracking, classification, and policy enforcement.
  • Developed and deployed Delta Live Tables (DLT) pipelines in Azure Databricks to enable automated, reliable data processing with built-in monitoring and data quality enforcement. Streamlined ETL processes using declarative data pipeline management and ensured real-time data processing for critical business applications.
  • Integrated Microsoft Fabric pipelines with Delta Live Tables and Azure Databricks to enable real-time analytics and automated data quality enforcement across enterprise datasets.
  • Architected scalable ETL pipelines using Azure Data Factory (ADF) and SQL Server Integration Services (SSIS), optimizing data flows with custom Python scripts for complex transformations.
  • Engineered high-performance data warehouse using Azure Synapse Analytics, optimizing T-SQL queries, configuring SQL Pool for parallel processing, and developing ETL pipelines with Data Flow. Implemented advanced security measures, achieving sub-second query times for 10TB+ datasets.
  • Integrated Microsoft Fabric for unified data management and analytics, leveraging OneLake for centralized data storage and Data Factory for seamless data movement. Built interactive reports and dashboards using Power BI within Fabric, providing real-time insights and improving decision-making across departments.
  • Developed a robust data integration and orchestration framework using Azure Synapse Analytics, taking advantage of its pipelines, activities, and triggers to automate and manage complex data workflows.
  • Designed a robust OAuth 2.0 authentication flow using Azure Active Directory B2C, implementing custom policies for social identity providers and integrating with Azure Key Vault for secure token storage.
  • Engineered real-time data ingestion solution using Azure Event Hubs, Stream Analytics, and Python, transforming and loading streaming data into Azure SQL Database for immediate analysis.
  • Implemented Azure DevOps for complete CI/CD pipeline automation, utilizing Git for version control, Azure Pipelines for continuous integration, and terraform for infrastructure as code. Ensured rigorous testing and validation across multiple deployment environments (dev, test, prod) to streamline software delivery in data engineering projects, with Terraform modules managing environment-specific configurations.
  • Optimized data pipelines performance using Azure Monitor, Azure Databricks tools, Apache Spark, and Apache Flink, achieving 20% improvement in processing efficiency. Implemented indexing strategies in Azure SQL Database after root cause analysis, reducing query response times and enhancing data reliability across integration processes, utilizing Data Lineage Tracking.
  • Designed and implemented data ingestion pipelines using Apache NiFi, automating data movement from diverse sources into Azure Data Lake Storage (ADLS Gen2) and Snowflake.
  • Designed and implemented robust data models as a Data Modeler, optimizing data types, keys, and constraints. Implemented data warehouse solutions using Star and Snowflake Schema for enhanced analytics and storage efficiency.
  • Implemented ETL pipelines using Python (Pandas, NumPy) and optimized SQL stored procedures for data warehousing, ensuring efficient data processing and management in an RDBMS.
  • Built interactive reports and dashboards using Power BI within Microsoft Fabric, providing real-time insights and driving informed business decision-making.
  • Implemented advanced data governance and quality checks within OneLake, improving data consistency and enabling seamless analytics.
  • Automated documentation and lineage tracking using dbt Docs and dbt tests, improving data transparency, stakeholder understanding, and compliance with data governance standards across ETL pipelines.
  • Automated batch data ingestion, transformation, and scheduling tasks using shell scripting, optimizing system efficiency, enhancing data manipulation capabilities, and enabling seamless system integration.
  • Implemented Change Data Capture (CDC) mechanisms with Azure Data Factory (ADF) and Azure Databricks for real-time data synchronization, while managing Slowly Changing Dimensions (SCD) using Azure SQL Data Warehouse and Azure Data Factory (ADF) to ensure accurate historical data maintenance and consistency in dimension tables.
  • Architected serverless functions in Azure, automating tasks and integrating seamlessly with other Azure services to optimize scalability and cost efficiency in application development.
  • Implemented Agile methodologies using Jira for backlog management, sprint tracking, and issue resolution in cross-functional data teams.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Azure Data Engineer

PECO
Philadelphia, PA
06.2018 - 09.2021
  • Accountable for gathering requirements, conducting system analysis, designing, developing, testing, and deploying scalable and secure solutions on Azure, leveraging key services such as Azure Functions, Azure API Management, Azure Cosmos DB, and Azure Kubernetes Service (AKS), to meet business needs and drive digital transformation.
  • Designed and implemented scalable ETL pipelines using Azure Data Factory (ADF) to extract, transform, and load data from SQL Server, Oracle, and MySQL into Snowflake and Azure Synapse. Integrated Azure Event Grid, Azure Stream Analytics, and Azure Databricks for near-real-time data processing and large-scale analytics.
  • Implemented Azure Data Lake Storage connector for multi-format (Avro, JSON, ORC, Parquet) I/O operations, leveraging Azure Data Lake Storage (ADLS Gen2) objects in mappings for efficient hierarchical data management.
  • Executed multi-faceted trigger strategies in Azure Databricks using PySpark, implementing time-based, fixed interval, and one-time triggers for complex ETL orchestration. Developed custom PySpark functions and integrated Databricks Jobs API for dynamic, dependency-based workflow initiation.
  • Developed Python-based Azure Functions to process real-time IoT data streams, leveraging Azure Data Lake Storage (ADLS Gen2)’s high-throughput capabilities to handle 100,000 events per second.
  • Created scalable ETL framework using Azure Data Factory (ADF), Python, and dynamic pipeline generation. Implemented CDC mechanisms and optimized performance for 50+ data sources through parallel execution and partitioning.
  • Built a high-performance data warehouse using Azure Synapse Analytics, integrating diverse data sources and leveraging PolyBase, partitioning, and indexing to optimize query performance and support advanced analytics.
  • Employed Azure Synapse Analytics, incorporating Synapse Pipelines built on Azure Data Factory (ADF), to create, schedule, and orchestrate hybrid data integration workflows, enhancing data pipeline management and automation.
  • Implemented Azure Event Hubs to ingest streaming data and stored it in Azure Blob Storage and Azure Data Lake Storage Gen2(ADLS Gen2) using Azure Data Factory (ADF) with trigger-based automation and linked services, ensuring robust and scalable real-time data storage.
  • Resolved complex Azure Data Factory (ADF) and Azure Synapse pipeline issues, minimizing downtime and ensuring optimal performance through expert troubleshooting (data flow debugging, trigger optimization, log analysis) and technical solutions (mapping data flows, data validation, error handling).
  • Implemented a multi-model NoSQL solution with Azure Cosmos DB, using SQL API for querying and designing a custom partitioning strategy that improved read performance by 50%.
  • Implemented Role-Based Access Control (RBAC) in Azure Synapse and Azure Data Lake Storage (ADLS Gen2), ensuring fine-grained access management and compliance with security best practices.
  • Designed and implemented real-time streaming solutions on Azure using Azure Stream Analytics, Azure Event Hubs, and Azure Databricks, ensuring efficient data processing and data integration.
  • Extensively utilized Azure Databricks Notebooks with PySpark, Pandas, and NumPy to automate data cleaning and preprocessing tasks. Corrected inconsistencies, and standardized formats, significantly enhancing data quality and operational efficiency.
  • Optimized Azure Databricks and Spark SQL performance by implementing caching strategies like Data Frame, RDD, Lazy Evaluation, in-memory computing, and query optimization, significantly enhancing data retrieval speeds.
  • Established real-time monitoring and alert management using Azure Monitor and implemented end-to-end security for streaming data with Azure Key Vault and Azure Active Directory, ensuring encryption and access controls.
  • Implemented serverless functions in Azure Function Apps to handle HTTP requests, timer events, and Azure service messages. Configured Logic Apps for automated email notifications and built dynamic data pipelines for multi-source extraction.
  • Delivered actionable insights and analytics on data usage, access patterns, and compliance status through Azure Purview, enabling informed decision-making and optimized data management strategies.
  • Developed and optimized ETL processes using PySpark in Azure Databricks and SQL in Azure SQL Database, enhancing data retrieval, parallel processing, and query execution for improved performance and scalability.
  • Optimized complex SQL queries in Azure Synapse Analytics, utilizing window functions and materialized views to reduce processing time of daily reports from hours to minutes.
  • Designed complex ETL data mappings including transformation, aggregation, and lookup functions in Informatica to streamline data flows across diverse sources and targets, ensuring seamless data integration and data consistency.
  • Optimized Azure DevOps CI/CD by implementing a matrix strategy in YAML pipelines, parallelizing builds and tests across environments, and integrating Terraform for automated, consistent infrastructure provisioning, reducing deployment errors.
  • Diagnosed and debugged SQL query performance issues in Azure SQL Database, improving query execution times by 30%.
  • Managed large datasets and pipelines using Git, enabling reproducibility, auditing, and collaboration through Data Asset Versioning, Lineage Tracking, Version Control, and Change Logging for efficient data management.
  • Utilized SQL Server Management Studio (SSMS) for query performance analysis, execution plan optimization, and troubleshooting database performance bottlenecks in SQL Server.
  • Developed interactive Power BI data visualizations and dashboards with DAX measures, custom visuals, and row-level security. Implemented real-time data refresh using Direct Query and created paginated reports (SSRS) for operational reporting, enhancing data-driven decision-making across the organization.
  • Designed and implemented real-time data streaming pipelines using Apache Kafka, enabling high-throughput and low-latency data ingestion into Azure Data Lake and Synapse Analytics.
  • Integrated Apache Kafka with Azure Data Factory (ADF) and Azure Stream Analytics to process, transform, and load real-time data into cloud-based storage and analytics platforms.
  • Utilized Apache Spark for Azure Synapse Analytics with Serverless Apache Spark pools, managing Spark applications and sessions via notebooks to execute and optimize Spark jobs for advanced analytics within the Synapse workspace.
  • Implemented Agile Methodologies, utilizing JIRA to streamline project management, facilitate team collaboration, and accelerate delivery through Sprint Planning, Backlog Management, Issue Tracking, and Velocity Monitoring.
  • Designed and optimized data warehouse architecture using Star and Snowflake schemas, improving data retrieval and storage efficiency. Implemented Snowflake’s multi-cluster architecture with time travel and zero-copy cloning for high-performance analytics, versioning, and recovery.

Big Data Developer

Abbott Laboratories
Illinois City, IL
04.2016 - 05.2018
  • Developed and optimized data pipelines using Hadoop ecosystem tools (HDFS, MapReduce, YARN, Spark, Hive, Pig) and PySpark, enhancing data processing throughput for large-scale analytics projects.
  • Integrated Apache Zookeeper with Apache Kafka and Hadoop ecosystem tools for distributed coordination, cluster state management, resource allocation, and load balancing, ensuring fault tolerance and high availability in data pipelines.
  • Implemented real-time data streaming solutions using Apache Kafka and Spark Streaming with Python, improving system uptime and reducing data processing latency for mission-critical applications.
  • Utilized Apache NiFi for data flow management and Apache Oozie for workflow scheduling, integrating with Kafka for seamless data streaming and processing across the big data ecosystem.
  • Designed and optimized NoSQL database solutions using HBase, implementing efficient schema and data models that improved query response times, while using SQL for data extraction and transformation.
  • Developed Spark applications using Scala and PySpark, leveraging RDDs and Data Frames to process large volumes of daily data, reducing processing time compared to traditional MapReduce jobs.
  • Implemented data warehousing solutions using Hive and wrote complex SQL queries, designing external tables with dynamic partitioning and bucketing that enhanced query performance for analytical workloads.
  • Utilized Azure HDInsight to manage and process large-scale data workloads, successfully migrating on-premises Hadoop clusters to the cloud, optimizing infrastructure costs using Python automation scripts, and ensuring seamless data migration during the process.
  • Developed and maintained data integration workflows using Apache Airflow, Apache Flink, and Python, automating end-to-end data pipelines that processed high volumes of records daily from various sources including Kafka streams. Designed and implemented ETL processes using Apache Sqoop, Flink SQL, and Python scripts, optimizing data ingestion from SQL databases to HDFS through efficient data partitioning and performance tuning, while utilizing Flink's event-time processing capabilities.

Data Analyst

Kroger
Ohio, OH
01.2014 - 03.2016
  • Managed Windows Server environments and Oracle Database 11g, optimizing performance with SQL Profiler, indexing, and query tuning.
  • Developed solutions using MS SQL Server 2014, resolving locking issues and improving query efficiency.
  • Designed and developed ETL workflows using SSIS and Informatica 9.1 to automate data extraction, transformation, and loading.
  • Developed and maintained data transformation pipelines using DBT for accurate and timely reporting.
  • Wrote optimized SQL queries and Unix scripts for data analysis, data modeling, and system administration tasks.
  • Designed OLAP cubes with SSAS and created automated reports and data visualizations using SSRS and Power BI for actionable insights.
  • Utilized Erwin Data Modeler to design and maintain conceptual, logical, and physical data models for effective data analysis.
  • Implemented Performance Point Server and SharePoint for unified business intelligence and collaborative workspaces.
  • Implemented data governance frameworks and security protocols to ensure data quality, consistency, and protection of sensitive information.
  • Designed scalable data warehouses using dimensional data modeling and star schemas, optimizing performance and business requirements.
  • Leveraged MS Office suite for project documentation, reporting, and automation using VBA.

Education

Master of Science - Computer Science

Saint Peter's University
Jersey City, NJ
2013

Bachelor of Science - computer science

Osmania University
Hyderabad
2011

Skills

  • Azure Data Factory expertise

  • Azure Synapse analytics

  • Event management expertise

  • Polybase implementation

  • Microsoft Azure SQL Server

  • Azure Synapse expertise

  • Proficient in Azure Data Lake Storage (ADLS Gen2)

  • Azure Logic Apps

  • Azure function apps

  • Cloud storage management

  • Azure Databricks expertise

  • Proficient in Azure DevOps

  • Azure Virtual Machines

  • Azure IOT Hub

  • Azure Data Lake Analytic

  • Azure Active Directory

  • Azure Monitor

  • Azure Purview

  • Azure Cosmos Db

  • Azure Key Vault

  • Azure Machine Learning

  • Hdfs

  • Map Reduce

  • Yarn

  • Hive

  • Sqoop

  • PySpark

  • Scala

  • Kafka

  • Performance Tuning

  • Oozie

Data quality management

Certification

DP-203 Microsoft Azure Data Engineer Associate

Timeline

Senior Azure Data Engineer

Morgan Stanley
10.2021 - Current

Azure Data Engineer

PECO
06.2018 - 09.2021

Big Data Developer

Abbott Laboratories
04.2016 - 05.2018

Data Analyst

Kroger
01.2014 - 03.2016

Master of Science - Computer Science

Saint Peter's University

Bachelor of Science - computer science

Osmania University
Saikiran Reddy DurgareddygariSenior Azure Data Engineer
Profile built at Zety.com