
Data Engineer with 8+ years of expertise in ELT/ETL development and Big Data processing. Highly proficient in the Azure stack, including Azure Data Factory, Synapse, and Blob Storage, with extensive hands-on experience in Python, Spark, and Unix Shell Scripting. Demonstrated success in optimizing data infrastructure, orchestrating major legacy ETL systems to Azure Cloud and implementing DevOps CI/CD processes. Skilled in building complex data flows to ingest massive volumes of relational and non-relational data for enterprise-scale analytics. Expertise extends to utilizing Databricks for advanced data transformations and managing Snowflake migrations to enhance query performance. Adept at defining secure cloud architectures using Azure Key Vault and Managed Identity to ensure strict data governance and compliance. Specializes in designing reusable data frameworks and generic ingestion patterns to minimize code redundancy and accelerate the delivery of high-quality data to downstream
Roles & Responsibilities:
• Designed and implemented data pipelines to enhance data accessibility and reliability.
• Creating pipelines, data flows and complex data transformations and manipulations using Azure Data Factory (ADF) and PySpark with Databricks.
• Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
• Designing and Developing Azure Data Factory (ADF) pipelines to extract the data from Relational sources like Teradata, Oracle, SQL Server, DB2 and non-relational sources like Flat files, JSON files, XML files, Shared folders etc.
• Developed streaming pipelines using Apache Spark with Python.
• Develop Azure Databricks notebooks to apply the business transformations and perform data cleansing operations.
• Develop Databricks Python notebooks to Join, filter, pre-aggregate, and process the files stored in Azure data lake storage.
• Ingested huge volume and variety of data from disparate source systems into Azure Data Lake Gen2 using Azure Data Factory V2.
• Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
• Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse and Azure Synapse.
Key Accomplishments
• Designed and implemented a scalable Databricks Medallion Architecture (Lakehouse), organizing data into Bronze (Raw), Silver (Cleansed/Enriched), and Gold (Curated) layers. This structure streamlined the transformation of raw ingestions into reliable Fact and Dimension Delta tables for high-performance downstream analytics.
• Transformed all the raw data from relational, non-relational and other storage systems, and integrated it for use with data-driven workflows to help map strategies, attain goals and drive business value from the data possessed
• Leveraged comprehensive PySpark transformations such as filter, join, groupby, with column, window functions, union, and UDFs to replicate complex business logic. implemented Delta Lake MERGE strategies to handle Change Data Capture (CDC) from source systems and maintain Slowly Changing Dimensions (SCD) Type II history in the Data Lakehouse.