Engineered comprehensive Framework application facilitating data ingestion from on-prem database, files, and APIs, consolidating data from all Macy's systems into BigQuery Data Lake environment.
Engaged in Macy's Technology Hackathon, developing Agentic AI solution to aid Legal team with CCPA requests and current legal requirements.
Supervised 15 contract engineers in addressing intricate break fixes and facilitating improvements to ETL data pipelines, elevating efficiency and cutting costs, guaranteeing smooth data flow for business operations and analytics.
Optimized over 50 high-usage tables on Google Cloud Platform, implementing advanced data modeling techniques and restructuring table schemas, which significantly reduced storage and compute costs by approximately 30%.
Ensured adherence to internal coding standards, identified technical debt, and mentored engineers, enhancing the efficiency of data processing automation.
Engineered a data ingestion framework for Macy's using Google Cloud Functions to seamlessly ingest data from Google Cloud Storage to BigQuery, offering enhanced customization, fault tolerance, and flexibility for ad-hoc data loads.
Data Engineer
General Motors
Roswell, GA
06.2021 - 06.2024
Created a standardized Databricks notebook to streamline the curation of Delta shared tables, implementing column-level encryption to safeguard confidential personally identifiable information (CPI).
Led a cross-functional team in partnership with an external vendor to develop an ETL architecture, establishing a unique customer identification system that facilitated seamless customer recognition across systems.
Co-managed the Global Customer Recognition Azure subscription, providing access to crucial data assets for marketing and lead campaigns, handling approximately 100,000 queries daily.
Contributed to a critical application migration, updating business logic to integrate data from private APIs, and converting PL/SQL queries to ANSI SQL for enhanced data curation.
Collaborated with 15 multidisciplinary teams to define a PySpark data extraction process, integrating 46 data frames from 31 different tables.
Led the migration of 12 enterprise-level applications from the Hive Data Warehouse to the company's on-premise Kubernetes platform.
Enhanced PySpark scripts and SQL queries for data curation focused on LRF tables, achieving a 35% reduction in processing time to meet business SLAs.
Implemented automation tools that dynamically retrieved configuration settings for workloads, reducing development time for outbound data extracts by over 66%.
Implemented comprehensive data quality checks to ensure data integrity post-data cleansing and curation, provided accurate information to key stakeholders, and supported data ingestion by other teams and services.