Summary
Overview
Work History
Education
Skills
Leadership Experience
Publications
Timeline
Generic

Shuying Zhu

College Station,TX

Summary

Seeking MLE/DS roles with startup data science internship experience and 5+ years of AI research. Developed unsupervised deep learning transformer models that outperformed rule-based systems and reduced costs by 90%. Proficient in managing large datasets (1000G+), with expertise in Spark, AWS Databricks, and Ray. Published in a high-impact journal (IF >8), backed by extensive deep learning research.

Overview

9
9
years of professional experience

Work History

Machine Learning Data Scientist Intern

Cambridge Mobile Telematics
06.2024 - 12.2024
  • Conducted extensive literature review on foundation models for time-series data and LLM domain adaptation and alignment, analyzed and compared different models, and delivered the summary into company technical blogs
  • Built a data ETL pipeline to process features from millions of time-series (>1000G) using Redshift, SQL, Spark, Pandas and Numpy, and prepare it with parquet format for AI models, and use Ray Data for data streaming and batching, lowering the costs by 90%
  • Designed and deployed a unsupervised (contrastive) learning pre-trained transformer-based model for driving data and a LLM-based method to align with time-series domain, implemented the models with PyTorch, and deployed on AWS Databricks
  • Visualized NN kernels, activations, trip details and features using seaborn, matplotlib and plotly

Research Assistant

Texas A&M University
01.2022 - 01.2024
  • Curated and preprocessed variants data (10,000,000 + entries) from ClinVar database with parallel computing, Pandas and Numpy, 40 times faster than using a single thread
  • Designed and finetuned context-aware and network-aware variant sequence classification methods with pretrained protein LLM and graph attention networks, enabling transfer learning from the general protein structure to variant prediction
  • Conducted 200+ experiments, tuned the hyperparameters on HPC with distributed data parallel frameworks, and achieved 10% improvement on prediction AUROC
  • Performance matched with the state-of-the-art (SOTA) variant classification methods, achieving 95% prediction AUROC

Research Assistant

The Hong Kong Polytechnic University
01.2020 - 01.2022
  • Efficiently conducted two dynamic COVID-19 research projects, adapting daily to evolving case data
  • Applied hypothesis testing (the Wilcoxon rank-sum test) to compare age distributions and age-specific incidence rate between the Hong Kong and Singapore populations, further analyzing the data with a Chi-square test to determine age-specific incidence rates
  • Completed data collection, data analysis, and academic writing within an accelerated three-day timeframe, with two peer-reviewed publications

Research Assistant

The University of Hong Kong
01.2019 - 01.2021
  • Extracted electronic health record (EHR) data from MIMIC-III database
  • Proposed a self-supervised model, contrastive learning framework, contrastive predictive autoencoder (CPAE) for representation learning for EHR
  • CPAE surpassed SOTA models by up to 10% on semi-supervised clinical prediction tasks with 1%, 5% and 10% label rate, with backbones including CNN and RNN

Product Manager

GOTCHA.com
01.2016 - 01.2019
  • Company Overview: Tech Startup
  • Communicated, and surveyed to understand product needs, target user characteristics and needs
  • Documented and prototyped product design, proactively communicated and discussed with the team and iterated the design documentation
  • Steered product development cycles, liaising between cross-functional teams and stakeholders to deliver user-centric tech solutions
  • Tech Startup

Education

MSc - Electrical Engineering

Texas A&M University
Texas
01.2024

MPhil - Biostatistics & Bioinformatics

University of Hong Kong
Hong Kong
01.2021

Bachelor of Science - Mathematics and Statistics

Xi'an Jiaotong University
Xi'an, China
01.2019

Skills

    Computer Science Basics

    Python, R, SQL, Linux, Git, Docker, C/C/C#, MATLAB, HTML/CSS, Data Structure and Algorithms

    Visualization & Presentation

    Tableau, MS Office suite, Matplotlib, Seaborn, Plotly, ggplot2

    AI Skills & Experiences

    PyTorch, TensorFlow, Transformers (BERT, GPT), Scikit-learn, Pandas, Numpy, LLM, GNN, regularization, Contrastive Learning, MLflow, fairseq, Generative AI (GPT, VAE, GAN, diffusion models), ResNet

    Big Data Skills & Exp

    AWS Databricks, Google Cloud, Spark, Ray, Distributed Computing, HPC

Leadership Experience

Vice President, Web Development Club, Xi'an Jiaotong University, Xi'an, China, 09/01/15, 06/01/19, Directed a team within a 200-member organization, prototyped product design, led UI/UX designers, front-end, and back-end developers across multiple projects. Orchestrated the end-to-end development of three web applications, enhancing the club's portfolio and the practical skills of members., Conceptualized and executed over 5 large-scale events, fostering community engagement.

Publications

  • CPAE: Contrastive predictive autoencoder for unsupervised pre-training in health status prediction, Shuying Zhu, Weizhong Zheng, and Herbert Pang, Computer Methods and Programs in Biomedicine, 234, 2023, >8
  • Predicting preterm births using US national birth data: a deep learning approach, Shuying Zhu, Dellinger Andrew, Chan Karen, Lam Wendy, Pang Herbert, Under Review
  • Different age pattern of COVID-19 cases in Hong Kong and Singapore by March 4, 2020, Shuying Zhu, Jun Tao, Huizhi Gao, Daihai He, BMC Infectious Diseases
  • Influenza versus COVID-19 cases among influenza-illness-like patients in travelers from Wuhan to Hong Kong in January 2020, Jun Tao, Huizhi Gao, Shuying Zhu, Lin Yang, Daihai He, International Journal of Infectious Diseases

Timeline

Machine Learning Data Scientist Intern

Cambridge Mobile Telematics
06.2024 - 12.2024

Research Assistant

Texas A&M University
01.2022 - 01.2024

Research Assistant

The Hong Kong Polytechnic University
01.2020 - 01.2022

Research Assistant

The University of Hong Kong
01.2019 - 01.2021

Product Manager

GOTCHA.com
01.2016 - 01.2019

Bachelor of Science - Mathematics and Statistics

Xi'an Jiaotong University

MSc - Electrical Engineering

Texas A&M University

MPhil - Biostatistics & Bioinformatics

University of Hong Kong
Shuying Zhu