Driving innovation at the intersection of AI/ML and Data Engineering to deliver scalable and impactful solutions
Deployed several data preprocessing pipelines using AWS CodePipeline and ECR repositories, improving manageability and reducing deployment time by 40%
Migrated on-premises Kafka connectors to AWS MSK, ensuring reliable data delivery with zero downtime while improving throughput by 35%
Migrated and transformed 18+ legacy ETL and ELT pipelines from on-premises Airflow to AWS Glue, reducing monthly infrastructure costs by 26%
Designed and implemented a real-time data processing architecture using AWS Kinesis and Lambda, processing over 1TB of data daily with sub-second latency
Led the implementation of a data governance framework using AWS Lake Formation and AWS Glue Data Catalog, enhancing data security and compliance
Data Engineer Intern
University Of Phoenix
Phoenix, AZ
05.2023 - 08.2023
Constructed scalable data pipelines processing over 1.5 PB of data monthly using AWS Glue, S3, and Redshift.
Developed automated data quality monitoring solutions, achieving a 65% reduction in data inconsistencies.
Implemented stream processing solutions with Kafka and AWS Lambda for real-time analytics.
Collaborated with data science teams to enhance ML model data pipelines, decreasing model training time by 30%.
DATA SYSTEMS AND STRATEGY INTERN
University of Phoenix
Phoenix, AZ
06.2022 - 01.2023
Developed API for publishing custom metrics, reducing data loss by 18%.
Collaborated with team in agile setting to create and deploy ETL jobs to stream data from disparate data sources to AWS Redshift (SQL Database).
Implemented Apache Spark ETL jobs for historical data backfilling from Neptune graph database.
Assisted in architecture for ML model deployment on AWS SageMaker, enabling real-time inference.
Data Engineering & Processing:
Apache Kafka, Apache Airflow, Apache Spark, ETL/ELT Pipelines, Data Lake Architecture, Real-Time & Batch Data Processing, Data Warehousing, Streaming Analytics, Data Governance & Cataloging
Programming Languages & Frameworks:
Python, Java, Scala, PySpark, Shell Scripting
RA-KT - Metacognition and Cognition Tracing: A novel tool that uses Bayesian Knowledge Tracing (BKT), along with other adaptive learning methods, to trace a student's performance after solving a set of problems.
tinyGPTSQL - A pre-trained GPT model for SQL text generation, A GPT-like decoder-only architecture with 0.229466 million parameters. Student's Confidence Tracing using BERT, A novel BERT-based architecture to predict a student's confidence during problem solving.
For More - Please refer to https://dibyanshuchatterjee.com/projects