Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

Dibyanshu Chatterjee

Austin,TX

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Systems and Strategy Engineer 2

University of Phoenix
01.2024 - Current
  • Driving innovation at the intersection of AI/ML and Data Engineering to deliver scalable and impactful solutions.
  • Deployed and maintained scalable data pipelines and cloud-based AI/ML solutions on AWS, leveraging DevOps and MLOps best practices for efficient model training and deployment.
  • Led a team of 4 to enable an efficient data pre-processing pipeline for online training and inference of a novel BERT based channel affinity model, increasing successful student engagement by 22%.
  • Migrated and Transformed over 18 legacy ETL and ELT pipelines from an on-prem Airflow server to AWS Glue, effectively driving down the monthly cost by 26%.
  • Built a robust CI/CD pipeline using AWS CodeBuild and CodePipeline to efficiently upload large files (over 1 GB) to data lakes.
  • Collaborated with stakeholders to create an LLM fine-tuning cloud framework using RLHF, ensuring LLaMA-7B model's compliance with company messaging policies

Data Engineer Intern

University of Phoenix
05.2023 - 08.2023
  • Ingested, processed, and transformed data from many sources into centralized data repositories
  • Scaled ETL operations for processing extensive student data, contributing to improved data manipulation efficiency
  • Innovatively integrated LLMs and Transformers pipeline to explore potential learning aids for students
  • Conducted load tests on custom APIs handling data streaming, improving performance by 30%
  • Collaborated in the design of a graph database and initiated an ETL process using Apache Spark, successfully migrating 1TB of data from an SQL database

Data Systems and Strategy Intern

University of Phoenix
06.2022 - 01.2023
  • Collaborated with the team in an agile environment to develop and deploy ETL jobs to stream data from disparate data sources to AWS Redshift (SQL Database)
  • Developed an API to publish custom metrics surrounding data ingestion, to cloud watch for better analytical insight, aiding in a reduction of 18% in data loss
  • Implemented and maintained Apache Spark ETL jobs for backfilling historical data from the Neptune graph database
  • Assisted in ML model deployment architecture on AWS SageMaker and created API endpoint for real-time inference, predicting student risk factors
  • Optimized model inference using Numba, reducing inference time by 46%

Education

Master of Science - Computer Science

Rochester Institute of Technology
Rochester, NY

Bachelor of Technology - Computer Science and Engineering

Ramaiah University of Applied Sciences

Skills

  • Data Engineering Tools: Apache Spark, Apache Kafka, AWS Glue, AWS Lambda, AWS EC2, Hadoop, Airflow
  • Ml Engineering Tools: AWS Sagemaker AI
  • Databases: AWS RDS, Redshift, Athena, S3, DynamoDB, Neptune, Glue Data Catalog, MySQL
  • Programming Languages: Python, Java, Pyspark, Scala, Shell
  • Other Tools: Docker, Postman, Spacelift(Terraform), Git

Certification

  • AWS Certified AI Practitioner

Timeline

Data Systems and Strategy Engineer 2

University of Phoenix
01.2024 - Current

Data Engineer Intern

University of Phoenix
05.2023 - 08.2023

Data Systems and Strategy Intern

University of Phoenix
06.2022 - 01.2023

Bachelor of Technology - Computer Science and Engineering

Ramaiah University of Applied Sciences

Master of Science - Computer Science

Rochester Institute of Technology
Dibyanshu Chatterjee