Azure Data Engineer with a proven track record at Westlake Corporation, adept at architecting complex ETL workflows and optimizing data models. Skilled in Azure Data Factory and SQL, I excel in delivering actionable insights and collaborating with BI teams, driving data-driven decision-making and enhancing performance for large-scale data workloads.
Overview
5
5
years of professional experience
Work History
Azure Data Engineer
Westlake Corporation
Houston, USA
08.2024 - Current
Architected and managed complex ETL workflows in Azure Data Factory, integrating on-premises and cloud data sources for seamless data movement.
Administered Azure SQL Database and Azure Synapse Analytics, applying indexing and query optimization techniques for large-scale workloads.
Designed advanced data models for analytics using Azure Synapse Analytics, enabling dimensional modeling within Azure SQL Data Warehouse.
Built real-time streaming data pipelines with Azure Stream Analytics and Azure Databricks for immediate data processing and analysis.
Optimized Azure Blob Storage and Azure Data Lake Storage Gen2 for scalable data storage, ensuring efficient access for downstream analytics.
Performed performance tuning on Azure SQL Database and Cosmos DB, including query optimization and indexing strategies for high-throughput workloads.
Integrated multiple data sources via Azure Data Factory, automating workflows with Azure Logic Apps and Functions.
Collaborated with BI teams to prepare data for reporting using Power BI and SSRS, generating actionable insights.
AWS Data Engineer
Comerica Bank
Dallas, USA
02.2023 - 07.2024
Managed complex ETL workflows using AWS Glue, Lambda, and Kinesis for robust data processing.
Administered Amazon RDS and Redshift databases while applying indexing strategies to optimize performance.
Designed data models and pipelines with Redshift to enhance analytical capabilities.
Built real-time data pipelines with Kinesis to enable near real-time analysis.
Streamlined data storage with S3, Glacier, and Data Lake solutions for efficient dataset access.
Executed performance tuning on RDS and Redshift to support high-throughput workloads.
Implemented automated workflows with AWS tools to enhance data movement efficiency.
Collaborated with BI teams using QuickSight to generate comprehensive reports from datasets.
GCP Data Engineer
Novartis Pharmaceuticals
Hyderabad, India
06.2021 - 07.2022
Architected and managed complex ETL workflows with Google Cloud Dataflow and Dataproc, integrating data from BigQuery, Cloud Storage, and on-premises systems.
Administered Google BigQuery, Cloud SQL, and Cloud Spanner, employing performance optimization techniques like partitioning and query optimization for efficient data processing.
Designed advanced data models and warehousing solutions in Google BigQuery using dimensional modeling to support business intelligence.
Built real-time ingestion and processing pipelines with Google Cloud Pub/Sub and Dataflow to enable event-driven architectures.
Optimized Google Cloud Storage and BigQuery for large-scale data storage, ensuring efficient access and seamless GCP service integration.
Performed performance tuning on BigQuery, Cloud Spanner, and Cloud SQL by optimizing queries and indexing strategies for low-latency operations.
Integrated diverse data sources into unified workflows using Dataflow and Apache Beam, automating processing pipelines for scalability.
Collaborated with BI teams to prepare datasets for reporting using Data Studio and Looker, delivering actionable insights to stakeholders.
Data Engineer
Star Health and Allied Insurance
Hyderabad, India
06.2020 - 05.2021
Architected and developed efficient ETL pipelines, ensuring reliable data movement across diverse platforms using Apache Kafka, Spark, and Airflow.
Managed and optimized PostgreSQL, MySQL, and NoSQL databases to enhance query performance and ensure high availability.
Created cloud-based data warehouses with Amazon Redshift and Snowflake, designing scalable schema structures for seamless analysis.
Built real-time data processing pipelines with Apache Kafka and Flink for live ingestion and operational intelligence.
Automated data engineering workflows using Apache Airflow and AWS Lambda to reduce manual intervention in complex processes.
Leveraged monitoring tools such as Prometheus and Grafana to track health metrics of data pipelines for proactive management.
Integrated curated datasets with BI tools like Power BI and Tableau, enabling stakeholders to generate real-time insights.
Refined data workflows using Apache Spark, implementing parallel processing strategies that significantly reduced execution times.