Dynamic Data Scientist with a proven track record at Datalynn, enhancing subscription sales by 15% through advanced ML models and strategic pricing. Expert in data analysis, A/B testing, and collaborative project management. Excels in leveraging statistical modeling and Python for impactful business solutions, with a keen focus on user satisfaction and revenue optimization.
- Subscription prediction and price model development
• Boosted Interview Copilot subscription sales by 15% with refitted price strategies and brought in 550+ members through advanced ML model deployment, pricing, and cross-functional teamwork.
• Collaborated with PMs to identify 15 key features from 2K+ users’ behaviors, leveraged clustering analysis algorithms (KMeans) for customer segmentation, and developed models through each segmentation.
• Built subscription and churn prediction models with Random Forest and XGBoost over behavioral, career (e.g., Target Company Tier), and market (e.g. Features to achieve an AUC-ROC of 0.83
• Designed a SaaS revenue formula and applied Bayesian Optimization to balance user retention and revenue through optimal pricing and discount strategies for diverse customer groups, over subscription rates, churn rates, and market trends.
- Interview Copilot Model Refinement
• Achieved a 20% increase in user re-subscription and a 45% improvement in user satisfaction over ChatGPT through LLM optimization by A/B testing, fine-tuning, and data mining, such as tokenization and classification.
• Designed A/B testing to select the best-performing LLM, leveraging t-tests, Chi-squared tests, and causal inference techniques to validate insights, including training strategies and output strategies.
• Fine-tuned LLMs and applied prompt engineering for Advanced Interview CopilotGPT, automating the classification of 750+ entries with logistic regression and CNN models, boosting topic-specific accuracy across eight specialized roles.
- Fraud Detection
• Enhanced crime coverage by more than 5 cases per month and reduced labor costs by 10% through implementing fraud analytical models and result visualization within the CBIRC Market Supervision Department.
• Created real-time Tableau dashboards to visualize model evaluations, clearly presenting key metrics such as loan defaults and bad debt rates to aid stakeholder decision-making.
• Owned end-to-end Credit Card Risk and Anti-Money Laundering models using graph (DFS, Floyd-Warshall, and BFS) and machine learning algorithms (SGDClassifier and RandomForestClassifier), identifying suspicious accounts across 10 banks and reducing average response time by 90% to 0.005 seconds.
Data Science: Statistical Modeling, Data Extraction and Analysis, Root Cause Analysis, Experimentation, A/B Testing, Data Mining, Jupyter, Python, R, NumPy, pandas, SciPy
AI/ML: Machine Learning, Computer Vision, Optimization, Scikit-learn, XGBoost, TensorFlow, MLflow
Programming & Tools: SQL, Python, AWS, Git, Simulation Engines, SAS, Matlab
Data Visualization: Tableau, Power BI, Matplotlib, Seaborn, Database: SQL, MySQL, PostgreSQL
Medical Insurance Recommendation System | New York, United States Nov 2023 - Dec 2023
• Developed a Flask-based web application to customize insurance plans using over 10k+ historical health data inPostgreSQL, predict diseases risks using Random Forest, Logistic Regression, and Gradient Boosting.
FedCF: Federated Learning-Based Car-Following Model | Shanghai, China Jan 2023 - May 2023
• Developed a Social-LSTM Car-Following Model in PyTorch with 799 NGSIM vehicle trajectories to improve autonomous driving safety, refined using Savitzky-Golay and K-means