Experienced Data Scientist with 5+ years In Machine Learning. Ph.D. in Physics.
Overview
11
11
years of professional experience
Work History
Senior Data Scientist
Behavidence
11.2022 - 08.2023
Developed smartphone-based predictive algorithms for mental health disorders (ADHD, stress, anxiety, depression), improving model accuracy by 15%. Applied anomaly detection and data balancing, impacting over 10,000 users. Utilized unsupervised learning for personalized healthcare. Contributed to 5+ projects predicting disease indicators and advancing health tech. Collaborated with cross-functional teams, providing mentorship and driving 25% increase in innovation.
Data Scientist Fellow
Techlent
05.2022 - 10.2022
Fraud Detection System Development: A designed model to detect healthcare fraud using Logistic Regression and Random Forest Classifier for high accuracy and addressed imbalanced data to improve model stability and precision, achieving PR AUC scores of 0.74 with Random Forest and 0.71 with Logistic Regression.
Postdoctoral Scientist
NYU Langone Health
04.2012 - 06.2015
Genomic Data Analysis: Developed pipeline to process and visualize leukemia genomic data. Conducted A/B tests to link gene TL1XR1 with medicine resistance. Shared findings to optimize steroid therapy.
Education
Ph.D. - Physics
Chinese Academy of Science
Shanghai, China
01.2010
Skills
Python, Machine Learning, Linear Regression, Logistic Regression, SVM, KNN, Decision Tree, Random Forest, Gradient Boosting, XGBoost, Unsupervised Learning, Data Visualization, AWS, Scikit-Learn, Numpy, Scipy, Pandas, SQL, NLP, Deep Learning, BERT
Additional Information
Mental Health Data Analysis using BERT
Data Preparation: Processed and labeled 6,500 social media posts using majority voting for depression classification.
Model Development: Fine-tuned BERT for depression detection in text data, implemented using Python and PyTorch.
Performance Improvements: Achieved 64% accuracy, 66% precision, 64% recall, and a 63% F1 score.
Impact: Delivered actionable insights for customer engagement, premium mental health app features, and healthcare partnerships.
Sentiment Analysis for Marketing via NLP
Evaluated deep learning algorithms on Amazon/IMDb review datasets for sentiment analysis.
Data preprocessing: Sampled from imbalanced datasets using the imbalanced-learn package.
Analyzed reviews with DistilBERT using Python and PyTorch.
Evaluated DistilBERT's performance, achieving 66% recall, 60% precision, and 62% F1 score.
Shop Owner/ Operations Manager/Digital Marketing Director at HH & Co BoutiqueShop Owner/ Operations Manager/Digital Marketing Director at HH & Co Boutique