Summary
Overview
Work History
Education
Skills
Timeline
Generic

Jingyeong Yu

Chicago,IL

Summary

Data Engineer with 4 years of experience in designing and optimizing scalable data pipelines, managing big data infrastructure, and supporting AI/ML model development. Expertise in data ingestion, transformation, and storage across cloud platforms, with a strong foundation in Python, PyTorch, Spark, and distributed computing. Proficient in Python, PyTorch, and distributed computing, with expertise in model development, AI integration, and data processing.

Overview

4
4
years of professional experience

Work History

Azure Data Engineer

BP
Chicago, IL
04.2021 - Current
  • Supported ML engineers and data scientists in building and optimizing ML models for anomaly detection in oil wells using decision trees, LightGBM, XGBoost, PyTorch, and clustering techniques. Assisted in designing regression and classification models for failure prediction.
  • Preprocessed streaming sensor data for predictive maintenance, leveraging time-series forecasting and feature engineering, reducing downtime by 30%.
  • Built real-time ML pipelines connecting Azure ML Studio to Databricks, SQLDB, and Power BI, enabling seamless visualization of AI model health metrics and streaming predictions.
  • Designed and automated data pipelines for AI models in Palantir Foundry, preprocessing large-scale time-series sensor data for anomaly detection models.
  • Led the migration of Databricks workspaces to Unity Catalog, reducing storage costs by 30% and improving query performance.
  • Integrated Azure DevOps CI/CD pipelines to automate the deployment of AI models, reducing manual intervention and enhancing model reproducibility.
  • Conducted vibration data analysis in JupyterLab within Palantir, incorporating velocity metrics and fault labels for predictive modeling.
  • Contributed to sprint planning, daily scrums, and retrospectives, supporting effective team collaboration and timely project delivery.

Data Analyst Intern

Forkaia
Irvine, CA
02.2021 - 06.2021
  • Conducted data profiling and ETL processes for datasets of 500,000+ records, improving financial reporting accuracy by 25%.
  • Implemented clustering and regression models in Python and R, optimizing customer segmentation and forecasting accuracy.
  • Designed and automated Power BI and Tableau dashboards, reducing report generation time from 6 to 4 hours

Data Analyst Intern

The Advanced Group
New York, NY
10.2020 - 01.2021
  • Calculated the difference in funds raised across 8 campaigns for clients and visualized it through graphs and dashboards to identify effective fundraising strategies, using Excel and Tableau.
  • Processed and examined 10,000+ records from NYCCFB and The Advance Group datasets to identify target voters and donors, utilizing Python for data cleaning, filtering, clustering, and regression analysis to drive campaign insights.
  • Evaluated 100,000+ records for operational reporting and ad hoc queries using Excel PivotTables, resolving data quality issues and reducing reporting errors by 40%.

Education

Master of Science - Computer Science

University of Chicago
Chicago, IL
02-2025

BBA - Computer Information Systems

Baruch College
New York, NY
12-2020

Bachelor of Science - Computer Science

Stony Brook University
Stony Brook, NY
01-2018

Skills

Python, SQL, R, Scala, Spark, C,Azure Machine Learning Studio, Databricks, Azure Data Factory, Power BI, DevOps, Kubernetes, ADLS, Blob Storage, Delta Lake, Event Hubs, Synapse Analytics, Logic Apps, SSMS, Oracle, azure data exlorer, Pytorch

Timeline

Azure Data Engineer

BP
04.2021 - Current

Data Analyst Intern

Forkaia
02.2021 - 06.2021

Data Analyst Intern

The Advanced Group
10.2020 - 01.2021

Master of Science - Computer Science

University of Chicago

BBA - Computer Information Systems

Baruch College

Bachelor of Science - Computer Science

Stony Brook University
Jingyeong Yu