Summary
Overview
Work History
Education
Skills
Selected Projects
Timeline
Generic

Ruochen Xiao

Jersey City,New Jersey

Summary

Dynamic Data Scientist with a proven track record at Datalynn, enhancing subscription sales by 15% through advanced ML models and strategic pricing. Expert in data analysis, A/B testing, and collaborative project management. Excels in leveraging statistical modeling and Python for impactful business solutions, with a keen focus on user satisfaction and revenue optimization.

Overview

2
2
years of professional experience

Work History

Data Scientist Intern

Datalynn
New York City, NY
06.2024 - 09.2024

- Subscription prediction and price model development
• Boosted Interview Copilot subscription sales by 15% with refitted price strategies and brought in 550+ members through advanced ML model deployment, pricing, and cross-functional teamwork.
• Collaborated with PMs to identify 15 key features from 2K+ users’ behaviors, leveraged clustering analysis algorithms (KMeans) for customer segmentation, and developed models through each segmentation.
• Built subscription and churn prediction models with Random Forest and XGBoost over behavioral, career (e.g., Target Company Tier), and market (e.g. Features to achieve an AUC-ROC of 0.83

• Designed a SaaS revenue formula and applied Bayesian Optimization to balance user retention and revenue through optimal pricing and discount strategies for diverse customer groups, over subscription rates, churn rates, and market trends.
- Interview Copilot Model Refinement
• Achieved a 20% increase in user re-subscription and a 45% improvement in user satisfaction over ChatGPT through LLM optimization by A/B testing, fine-tuning, and data mining, such as tokenization and classification.
• Designed A/B testing to select the best-performing LLM, leveraging t-tests, Chi-squared tests, and causal inference techniques to validate insights, including training strategies and output strategies.

• Fine-tuned LLMs and applied prompt engineering for Advanced Interview CopilotGPT, automating the classification of 750+ entries with logistic regression and CNN models, boosting topic-specific accuracy across eight specialized roles.

Machine Learning Intern

CCB FinTech Co., Ltd
Shanghai, China
01.2023 - 08.2023

- Fraud Detection
• Enhanced crime coverage by more than 5 cases per month and reduced labor costs by 10% through implementing fraud analytical models and result visualization within the CBIRC Market Supervision Department.
• Created real-time Tableau dashboards to visualize model evaluations, clearly presenting key metrics such as loan defaults and bad debt rates to aid stakeholder decision-making.
• Owned end-to-end Credit Card Risk and Anti-Money Laundering models using graph (DFS, Floyd-Warshall, and BFS) and machine learning algorithms (SGDClassifier and RandomForestClassifier), identifying suspicious accounts across 10 banks and reducing average response time by 90% to 0.005 seconds.

Education

Master of Science - Computer Science

New York University
New York, NY
05-2025

Bachelor of Science - Telligence Science & Technology, School of Compute

Shanghai University
Shanghai, China
06-2023

Skills

Data Science: Statistical Modeling, Data Extraction and Analysis, Root Cause Analysis, Experimentation, A/B Testing, Data Mining, Jupyter, Python, R, NumPy, pandas, SciPy

AI/ML: Machine Learning, Computer Vision, Optimization, Scikit-learn, XGBoost, TensorFlow, MLflow

Programming & Tools: SQL, Python, AWS, Git, Simulation Engines, SAS, Matlab

Data Visualization: Tableau, Power BI, Matplotlib, Seaborn, Database: SQL, MySQL, PostgreSQL

Selected Projects

Medical Insurance Recommendation System | New York, United States                               Nov 2023 - Dec 2023
• Developed a Flask-based web application to customize insurance plans using over 10k+ historical health data inPostgreSQL, predict diseases risks using Random Forest, Logistic Regression, and Gradient Boosting.

FedCF: Federated Learning-Based Car-Following Model | Shanghai, China Jan 2023 - May 2023 

• Developed a Social-LSTM Car-Following Model in PyTorch with 799 NGSIM vehicle trajectories to improve autonomous driving safety, refined using Savitzky-Golay and K-means

Timeline

Data Scientist Intern

Datalynn
06.2024 - 09.2024

Machine Learning Intern

CCB FinTech Co., Ltd
01.2023 - 08.2023

Master of Science - Computer Science

New York University

Bachelor of Science - Telligence Science & Technology, School of Compute

Shanghai University
Ruochen Xiao