Experienced Data Engineer with 15+ years in designing and optimizing large-scale data pipelines and architectures. Expert in Python, SQL, and data management tools like Databricks, Apache Spark, Kafka, and Snowflake. Proven track record in driving business growth, and improving operational efficiency. Skilled in leading cross-functional teams to deliver impactful data solutions that support real-time decision-making and advanced analytics.
Work History
Senior Data Engineer
Marsh and McLennan
Extracted data from Oracle databases using PySpark, staged it in AWS S3, transformed it with AWS Glue into JSON format, and loaded it into MongoDB via Python scripts with workflows orchestrated by Apache Airflow, incorporating Kafka consumers to read and load JSON data into MongoDB, and created data pipelines supporting real-time APIs and Snowflake Data Warehouse for enhanced business decision-making
Improved job efficiency by 3X with a Python script using multithreading to update historical records across nested JSON layers in MongoDB collections
Helped deliver a 20% sales increase for the marketing team by building a dedicated data pipeline for relevant insurance product placements.
Senior Data engineer
Commonwealth Bank of Australia
Led a team to extract data from heterogeneous sources relevant to banking operations using Azure Data Factory, storing it in Azure Blob Storage,Data transformation tasks were executed with Databricks using Python and Spark SQL,Implemented Delta Lakes to standardize and templatize data pipeline creation, enhancing developer experience
Utilized Spark optimization techniques like skewed data handling and Z-Ordering to enhance query performance and storage efficiency
Increased the bank's top-line revenue by 4% through targeted marketing on packaged products and improved bottom-line savings with a better consumer risk assessment system.
Data Engineer
Kohls Department Store
Implemented Hadoop ecosystem in production at KOHLS, setting up architecture and technologies, including data ingestion with Sqoop, development of optimized ORC files in HIVE, creation of UDTF functions for XML extraction, HiveQL for business rules and shell scripting for automation
Engineered Scala XML libraries to automate extraction from complex/nested XML, achieving streamlined data processing
Improved the accuracy of product recommendations in retail search results using geohashing-based location mapping.
Senior Specialist - Financial Controls at Marsh And McLennan Global Services LimitedSenior Specialist - Financial Controls at Marsh And McLennan Global Services Limited