Architected and led the integration of 3rd-party vendor data (ObserveAI) into enterprise data systems, designing scalablepipelines and governance frameworks to ensure high data quality, reliability, and compliance.
Identified and remediated critical data quality gaps across the AML (Anti-Money Laundering) organization by designing robust validation frameworks and automated monitoring, ensuring alignment with stringent compliance and regulatory standards.
Partnered cross-functionally with data analysts, compliance teams, and platform engineers to streamline data ingestion workflows, improve data observability, and reduce incidents related to data trust.
Data Engineer II
Amazon Inc.
08.2022 - Current
Implemented end-to-end architecture for ingesting near real time equipment data that enabled predictive alerting systems to escalate high severity sorter breakdown events. Any stoppage impacts the customer deliveries directly, on average about 20k to 50k packages are processed by a conveyor every hour
Developed an automated alerting mechanism to notify owners of the data collection process about missing events data. This process has reduced missing data by about ~26% and increased the accuracy of KPIs measured
Developed data pipeline to process over 100 GB by consolidating data from multiple disparate sources into a single destination, enabling quick data analysis for reliable business intelligence and data science use case
Business Intelligence Engineer II
Amazon Inc.
02.2019 - 08.2022
Performed extract, transform and load (ETL) operations to onboard and ingest critical data sources which was utilized to reduced customer frustration for Alexa Shopping customers by ~2%
Built a semi-automated tool using Python scripts for performing A/B tests on Amazon Locker locations to enable Operations/Product teams to measure the results of various experiments with ~15% higher accuracy
Developed automated data pipeline to consolidate data from multiple disparate sources to enable reporting process for Weekly/Monthly business review reports for leadership team
Data Scientist
Lennox International
05.2017 - 01.2019
Developed and tested performance of various classification models in a distributed cloud environment using PySpark (Data bricks) to predict failure in Heating and Ventilation system by analyzing the sensor data with 95% accuracy
Education
M.S. - Business Analytics
The University of Texas At Dallas
05.2018
B.E. - Computer Science and Engineering
Anna University
05.2016
Skills
Big Data Tools: Snowflake, S3, Redshift, Glue, Athena, AWS (cloud platform)
Data Engineering: DBT, Airflow, Kafka, ETL
Programming: Python , Advance SQL
Certifications: AWS Big Data, AWS Data Analytics, Deep Learning Specialization, Google Analytics