Summary
Overview
Work History
Education
Skills
Certification
Timeline
CustomerServiceRepresentative
Hongling Yang

Hongling Yang

Data Scientist
Temecula,CA

Summary

Data science professional with a PhD in statistics and more than 10 years’ experience and a strong statistical and analytical background. Expertise in data mining, python, R, SAS, C++, machine learning, and statistics. Passionate about solving problems using data, and presenting insights to business audiences

Overview

16
16
years of professional experience
10
10
years of post-secondary education
3
3
Certifications
2
2
Language

Work History

Data Science Trainee

Sprinboard
San Francisco, CA
02.2023 - 07.2023

PROJECT 1: Fashion Product Image Classification with Convolution Neural Network

GitHub - hyang78227/CapstoneProjectTwo

  • Data Augmentation: Leveraged ImageDataGenerator to bolster the representation of the minority classes in the dataset
  • Deep Learning: Achieved >95% accuracy with a 5-layer CNN via Transfer Learning. Leveraged VGG16 to extract 20 features and built a CNN-based recommendation system for 4000+ fashion products.
  • Optimization: Implemented the Hyperband algorithm for hyperparameter optimization, resulting in a predictive accuracy enhancement from 95% to 97.5%

PROJECT 2: A Google App Store Educational Apps Rating Analysis

GitHub - hyang78227/capstone-project3

  • Data Exploration: Conducted EDA following data wrangling, pre-processing, and visualization techniques
  • Classification: Employed Decision Tree (85% accuracy), Random Forest (92% accuracy), and Gradient Boosting (94% accuracy) classifiers to predict the rating tiers of educational apps
  • Imbalance Resolution: Utilized the imbalanced-learn module to mitigate class distribution discrepancies, elevating minority class representation from an initial 10% to a balanced 45%
  • Optimiation: Employed the Hyperband algorithm to systematically optimize hyperparameters in classification models

PROJECT 3: Big Mountain Ski Resort Ticket Pricing Study

GitHub - hyang78227/DataScienceG

  • Modeling: Utilized Multivariate Linear Regression and Random Forest Regression techniques, resulting in value enhancements of $11 and $19 per ticket, respectively
  • Pipeline Architecture: Formulated an optimized pipeline for data preprocessing, regression, tuning, and model selection, encapsulated in a singular Python notebook
  • Optimization: Utilized GridSearchCV for hyperparameter tuning, enhancing model training accuracy from 92% to 96%.

OTHER PROJECTS

GitHub - hyang78227/Springboard

  • COVID-19 Patient State Classifications: Using the South Korean COVID-19 dataset, a Random Forest Classifier was deployed to delineate patient states—'isolated', 'released', and 'deceased'—achieving a classification accuracy of 92%
  • Flight Departure Delay Prediction: Employed Light GBM for the prediction of flight delays surpassing 15 minutes, attaining a model accuracy of 94%. The precision was augmented through Bayesian Optimization for hyperparameter tuning and strategic feature engineering
  • Cigarette Sales Time Series Analysis: Performed a detailed analysis of Cowboy Cigarettes' historical sales datasets. Through time series forecasting, projected sales trends with a Mean Absolute Percentage Error (MAPE) of 3.5%
  • Wine Customer Segmentation: Employed K-means clustering on wine customer datasets, segmenting customers by their responses to wine offers. The clustering exhibited a silhouette score of 0.75, indicating well-defined customer segments

Statistician

School Of Medicine, University Of California
San Diego, CA
03.2016 - 08.2017
  • District Study: Analyzed associations between alcohol use and IPV; explored the alcohol-IPV-HIV infection relationship using structural equation modeling
  • Alcohol & IPV Intervention: Tested a 2-arm pilot intervention for men, estimating required effect sizes for broader application
  • Baltimore HIV Risk Study: Used mixed methods to study how neighborhood factors affect forced sex rates among African American women, influencing HIV risk
  • Stress & HIV Risk Analysis: Employed a multilevel design to understand stress-related pathways between forced sex and HIV
  • Risk behaviors.

Statistical Consultant

Texas Tech Health Center
El Paso , TX
09.2010 - 12.2014
  • Spearheaded medical research, mentoring residents throughout the process
  • Originated research concepts and outlined sampling strategies
  • Enhanced the resident program with expertise in Statistical Computing
  • Partnered with physicians in executing clinical trials.

Statistician

College of Engineering, University of Texas
El Paso, TX
01.2008 - 01.2014
  • Geographic Information System (GIS) Research: Played an instrumental role in the 'Ride8 Project', targeting Ozone pollution analysis in El Paso, TX
  • Collaborative Research: Teamed up with fellow researchers, culminating in multiple published papers
  • Academic Endeavors: Imparted knowledge as an educator in mathematics and statistics courses.

Lecturer

University of Texas
El Paso , TX
08.2008 - 01.2016
  • Academic Endeavors: Imparted knowledge as an educator in mathematics and statistics courses.

Education

PhD - Statistics

Arizona State University
Tempe, AZ
01.2005 - 05.2008

M.S - Statistics University

Texas
El Paso El Paso, TX
01.2003 - 05.2005

B.S - Finance

Peking University
01.1998 - 05.2002

Skills

    DBMS: MS SQL Server, MySQL, Postgres

undefined

Certification

SAS Certified Advanced Programmer for SAS 9 (Certified Serial Number: AP011585v9)

Timeline

Data Science Trainee

Sprinboard
02.2023 - 07.2023

Statistician

School Of Medicine, University Of California
03.2016 - 08.2017

Statistical Consultant

Texas Tech Health Center
09.2010 - 12.2014

Lecturer

University of Texas
08.2008 - 01.2016

Statistician

College of Engineering, University of Texas
01.2008 - 01.2014

PhD - Statistics

Arizona State University
01.2005 - 05.2008

M.S - Statistics University

Texas
01.2003 - 05.2005

B.S - Finance

Peking University
01.1998 - 05.2002
Hongling YangData Scientist