Over 5 years of experience as a Data Engineer with expertise in building scalable data pipelines, data lakes, and ETL processes using Python, SQL, and Big Data technologies on AWS, Azure, and GCP platforms.
Proficient in designing robust data architectures, implementing data integration strategies, and optimizing data workflows to support advanced analytics and data-driven decision-making.
Hands-on experience with cloud platforms including AWS (EC2, S3, RDS, Lambda), Azure (Data Factory, Data Lake, Databricks), and GCP (BigQuery, Dataflow).
Skilled in CI/CD pipeline development using Jenkins, Docker, Kubernetes, and Terraform to ensure efficient deployment and automation.
Strong knowledge in data modeling, data warehousing, and creating data visualizations using Power BI and Tableau.
Proven ability to work in Agile and Waterfall environments, delivering high-quality solutions within project timelines.
Overview
7
7
years of professional experience
Work History
Azure data engineer
ESPN
Bristol, USA
07.2024 - Current
Designed and managed data warehouses and data lakes using Azure Data Lake, Azure Data Factory, and Blob Storage
Developed and optimized ETL pipelines using Python with libraries like Pandas, PySpark, and also Azure Databricks for seamless data integration
Built automated data ingestion workflows using Spark, Sqoop, Oozie, and Control-M
Implemented real-time data streaming solutions using Kafka and Spark Streaming
Deployed Infrastructure as Code (IaC) using Terraform for cloud resource provisioning and scaling
Created analytical dashboards and reports using Power BI and Azure Analysis Services
Managed CI/CD pipelines in Agile environments using Jenkins, Bitbucket, and Bamboo
Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment
Developed database triggers and stored procedures using T-SQL cursors and tables
Use Python's Unit Testing library to test various programs on Python and other codes
AWS Data Engineer
Nuvance Health
Danbury, USA
08.2023 - 06.2024
Developed ETL jobs for data extraction and integration into Redshift data marts
Implemented AWS S3 security frameworks using Lambda and DynamoDB
Worked on Kafka-based real-time streaming and Spark data processing
Automated AWS infrastructure provisioning with Terraform
Built data models and visualized reports using Power BI
Developing and maintaining AWS Analysis Services models to support business intelligence and data analytics requirements, creating measures, dimensions, and hierarchies for reporting and visualization
Extensively worked with Pyspark/Spark SQL for data cleansing and generating Data Frames and RDDs
Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level
I have used T-SQL for the MS SQL server and ANSI SQL extensively on disparate databases
GCP Data Engineer
Kotak General Insurance
Mumbai, India
05.2020 - 12.2021
Developed SQOOP scripts to migrate data from Oracle to Big Data environments, facilitating seamless data integration
Created T-SQL scripts to manage Azure SQL Database objects, including tables, indexes, and views, for optimized data storage and retrieval
Designed and maintained GCP cloud solutions using Dataproc and BigQuery for scalable data processing and analytics
Utilized Spark SQL with Scala and Python interfaces to convert RDD case classes to schema RDDs for advanced data processing
Authored custom PySpark UDFs for data manipulation, aggregation, labeling, and cleaning tasks
Managed large datasets using Panda's data frames and SQL, ensuring efficient data analysis and reporting
Ingested data into Azure Data Lakes using Azure Data Factory, processing it in Databricks for day-to-day business requirements
Implemented end-to-end data pipelines using Apache Beam and Cloud Dataflow, validating data between raw source files and BigQuery tables
Data Engineer
UltraTech Cement
Mumbai, India
03.2018 - 04.2020
Developed PySpark applications for ETL operations, optimizing data pipelines for efficient data processing and integration
Built scalable distributed data solutions using Hadoop, enhancing data storage and retrieval capabilities
Designed and developed dashboards and reports using Power BI, enabling data visualization and business intelligence
Configured Spark Streaming with Kafka to capture real-time data and store it in DBFS for continuous processing
Created Databricks Spark jobs using PySpark for complex table operations and data transformations
Processed data from multiple file formats, including XML, CSV, and JSON, using Spark jobs for analytics and reporting
Automated AWS infrastructure management using Terraform, achieving high availability and reducing management time by 90%
Utilized Azure Data Factory to ingest and process data in Databricks, loading results into Azure Data Lakes
Migrated data from on-premises SQL Server to cloud databases, including Azure Synapse Analytics and Azure SQL DB
Implemented a CI/CD system using Git, Jenkins, MySQL, and custom Python/Bash tools, automating deployment and integration