Machine Learning Research Scientist and Data Scientist with a Ph.D. in Computer Science and 6+ years of experience developing and deploying scalable machine learning and deep learning systems for scientific and real-world applications. Expertise in PyTorch, NLP, transformer-based models, LLM finetuning, representation learning, and large-scale experimentation on HPC and cloud infrastructure. Proven track record conducting rigorous ML research, building production-oriented AI solutions, and collaborating across multidisciplinary teams to translate advanced research into impactful products and tools.
Overview
16
16
years of professional experience
Work History
Research Scientist 3
University of Washington
Seattle, WA
10.2021 - 04.2026
Designed, fine-tuned, and evaluated ML models (logistic regression, random forest, neural networks, topic models) for disease diagnosis and grouping, achieving up to 90% classification accuracy on sensitive biomedical datasets.
Led end-to-end analytical workflows, including data curation, feature engineering, rigorous experimental design, model selection, validation, interpretation, and reproducible pipeline development.
Conducted robustness, feature importance, and bias analysis using cross-validation, data augmentation, sensitivity testing, ablation studies, and SHAP-based interpretability to ensure model reliability and generalizability across cohorts.
Served as technical lead across 10+ clinical and biomedical studies, integrating NLP pipelines with statistical inference to identify data inconsistencies and support successful grant funding.
Partnered with interdisciplinary research teams including clinicians, bioinformaticians, and data scientists to translate complex biomedical research questions into scalable ML solutions.
Architected and deployed scalable ML systems using AWS Lambda, API Gateway, and SLURM-based HPC job orchestration for parallel and sequential workloads, reducing end-to-end pipeline runtime by ~50%.
Graduate Researcher
La Trobe University
Melbourne, Australia
08.2016 - 06.2021
Built and optimized deep learning and NLP-based text summarization models using Transformers, LSTM/RNN, CNN-attention, Word2Vec, and BERT.
Applied supervised and unsupervised machine learning techniques (SVM, Random Forest, LDA, NMF) for classification, clustering, and pattern detection.
Improved model reliability through fine-tuning, reinforcement learning, and regularization; contributed to peer-reviewed publications including IJCNN 2021.
Performed data extraction, preprocessing, integration, and analysis for large-scale NLP datasets, and engineered end-to-end ETL and SQL-based data transformation pipelines to support feature engineering and model development.
Research Assistant
Sungkyunkwan University
South Korea
03.2013 - 02.2015
Developed graph-based dependency and similarity algorithms using ontologies and bipartite networks.
Sr. Software Engineer
Rich Business System Ltd.
Bangladesh
02.2010 - 12.2012
Built and maintained enterprise backend systems supporting 1M+ records.
Optimized SQL queries, indexing strategies, and data workflows.
LumpIt (Predictive Tool), End-to-end ML predictive modeling, AHSG Consortium 2024, and API deployment for rare disease classification using AWS Lambda and API Gateway- Website for prediction: https://depts.washington.edu/jxchong-lab/LumpIt/
Khanam, S., et al., "Concept-based Topic Attention for Convolutional Sequence Summarization." International Joint Conference on Neural Networks 2021. Paper
Winner, Best open-source ML solution for rare disease research: ETL, phenotype encoding, NLP/embedding workflows - https://github.com/jxchong-lab/GlobalGenes_RareX_Challenge_2023