PROJECT 1: Fashion Product Image Classification with Convolution Neural Network
GitHub - hyang78227/CapstoneProjectTwo
- Data Augmentation: Leveraged ImageDataGenerator to bolster the representation of the minority classes in the dataset
- Deep Learning: Achieved >95% accuracy with a 5-layer CNN via Transfer Learning. Leveraged VGG16 to extract 20 features and built a CNN-based recommendation system for 4000+ fashion products.
- Optimization: Implemented the Hyperband algorithm for hyperparameter optimization, resulting in a predictive accuracy enhancement from 95% to 97.5%
PROJECT 2: A Google App Store Educational Apps Rating Analysis
GitHub - hyang78227/capstone-project3
- Data Exploration: Conducted EDA following data wrangling, pre-processing, and visualization techniques
- Classification: Employed Decision Tree (85% accuracy), Random Forest (92% accuracy), and Gradient Boosting (94% accuracy) classifiers to predict the rating tiers of educational apps
- Imbalance Resolution: Utilized the imbalanced-learn module to mitigate class distribution discrepancies, elevating minority class representation from an initial 10% to a balanced 45%
- Optimiation: Employed the Hyperband algorithm to systematically optimize hyperparameters in classification models
PROJECT 3: Big Mountain Ski Resort Ticket Pricing Study
GitHub - hyang78227/DataScienceG
- Modeling: Utilized Multivariate Linear Regression and Random Forest Regression techniques, resulting in value enhancements of $11 and $19 per ticket, respectively
- Pipeline Architecture: Formulated an optimized pipeline for data preprocessing, regression, tuning, and model selection, encapsulated in a singular Python notebook
- Optimization: Utilized GridSearchCV for hyperparameter tuning, enhancing model training accuracy from 92% to 96%.
OTHER PROJECTS
GitHub - hyang78227/Springboard
- COVID-19 Patient State Classifications: Using the South Korean COVID-19 dataset, a Random Forest Classifier was deployed to delineate patient states—'isolated', 'released', and 'deceased'—achieving a classification accuracy of 92%
- Flight Departure Delay Prediction: Employed Light GBM for the prediction of flight delays surpassing 15 minutes, attaining a model accuracy of 94%. The precision was augmented through Bayesian Optimization for hyperparameter tuning and strategic feature engineering
- Cigarette Sales Time Series Analysis: Performed a detailed analysis of Cowboy Cigarettes' historical sales datasets. Through time series forecasting, projected sales trends with a Mean Absolute Percentage Error (MAPE) of 3.5%
- Wine Customer Segmentation: Employed K-means clustering on wine customer datasets, segmenting customers by their responses to wine offers. The clustering exhibited a silhouette score of 0.75, indicating well-defined customer segments