Overview
Work History
Education
Skills
Certification
Websites
Projects
Timeline
Generic

Dibyanshu Chatterjee

Austin,Texas

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer 2

University of Phoenix
Phoenix, AZ
01.2024 - Current
  • Driving innovation at the intersection of AI/ML and Data Engineering to deliver scalable and impactful solutions
  • Deployed several data preprocessing pipelines using AWS CodePipeline and ECR repositories, improving manageability and reducing deployment time by 40%
  • Migrated on-premises Kafka connectors to AWS MSK, ensuring reliable data delivery with zero downtime while improving throughput by 35%
  • Migrated and transformed 18+ legacy ETL and ELT pipelines from on-premises Airflow to AWS Glue, reducing monthly infrastructure costs by 26%
  • Designed and implemented a real-time data processing architecture using AWS Kinesis and Lambda, processing over 1TB of data daily with sub-second latency
  • Led the implementation of a data governance framework using AWS Lake Formation and AWS Glue Data Catalog, enhancing data security and compliance

Data Engineer Intern

University Of Phoenix
Phoenix, AZ
05.2023 - 08.2023
  • Constructed scalable data pipelines processing over 1.5 PB of data monthly using AWS Glue, S3, and Redshift.
  • Developed automated data quality monitoring solutions, achieving a 65% reduction in data inconsistencies.
  • Implemented stream processing solutions with Kafka and AWS Lambda for real-time analytics.
  • Collaborated with data science teams to enhance ML model data pipelines, decreasing model training time by 30%.

DATA SYSTEMS AND STRATEGY INTERN

University of Phoenix
Phoenix, AZ
06.2022 - 01.2023
  • Developed API for publishing custom metrics, reducing data loss by 18%.
  • Collaborated with team in agile setting to create and deploy ETL jobs to stream data from disparate data sources to AWS Redshift (SQL Database).
  • Implemented Apache Spark ETL jobs for historical data backfilling from Neptune graph database.
  • Assisted in architecture for ML model deployment on AWS SageMaker, enabling real-time inference.

Education

Master of Science - Computer Science

Rochester Institute of Technology
Rochester, NY

Bachelor of Technology - Computer Science

Ramaiah University Of Applied Sciences
Bengaluru, India

Skills

  • Cloud & AWS Ecosystem:
    Amazon S3, Lambda, Glue ETL, Amazon MSK (Kafka), MWAA (Airflow), Lake Formation, DynamoDB, Redshift, Athena, EMR, ECS, ECR, CloudWatch, IAM, Step Functions, Kinesis, CloudFormation
  • Data Engineering & Processing:
    Apache Kafka, Apache Airflow, Apache Spark, ETL/ELT Pipelines, Data Lake Architecture, Real-Time & Batch Data Processing, Data Warehousing, Streaming Analytics, Data Governance & Cataloging
  • Programming Languages & Frameworks:
    Python, Java, Scala, PySpark, Shell Scripting
  • DevOps, Infrastructure & Automation:
    Docker, Kubernetes, Git, CI/CD Pipelines, Terraform, Infrastructure as Code (IaC), Observability & Monitoring

Certification

  • AWS Certified AI Practioner

Projects

  • RA-KT - Metacognition and Cognition Tracing: A novel tool that uses Bayesian Knowledge Tracing (BKT), along with other adaptive learning methods, to trace a student's performance after solving a set of problems.
  • tinyGPTSQL - A pre-trained GPT model for SQL text generation, A GPT-like decoder-only architecture with 0.229466 million parameters. Student's Confidence Tracing using BERT, A novel BERT-based architecture to predict a student's confidence during problem solving.
  • For More - Please refer to https://dibyanshuchatterjee.com/projects

Timeline

Data Engineer 2

University of Phoenix
01.2024 - Current

Data Engineer Intern

University Of Phoenix
05.2023 - 08.2023

DATA SYSTEMS AND STRATEGY INTERN

University of Phoenix
06.2022 - 01.2023

Master of Science - Computer Science

Rochester Institute of Technology

Bachelor of Technology - Computer Science

Ramaiah University Of Applied Sciences
Dibyanshu Chatterjee