Proficient proteomics data scientist with expertise in mass spectrometry data analysis, bioinformatics, and computational biology, driving impactful insights through advanced data-driven methodologies
Overview
15
15
years of professional experience
Work History
Manager, Biostatistics
InterVenn Biosciences
South San Francisco, CA
07.2023 - 09.2024
Manage the clinical data science team, facilitate inter-departmental collaborations and communications, resource allocation for various projects
Data analysis of patient data generated from internal or external patient samples in targeted assays for various disease conditions including CRC, ovarian cancer, NASH, HCC, pancreatic cancer, Covid
Generating machine learning models for predicting diseased states compared to controls
Software tools used include Python, Scikit-Learn, Numpy, Pandas, Matplotlib, Google Cloud Platform (GCP), git, R (familiarity)
Building and utilizing data analysis pipelines for converting raw experimental mass spectrometry generated data into to a usable format
Generating reports for quality checks of experimental runs and exploratory data analysis
Building and maintaining Google Cloud SQL (MySQL) and Cloud Firestore (no SQL) database on Google Cloud Platform to store data generated from patient samples
Data analyses of discovery glycoproteomics experiments using Proteome Discoverer/Byonic to find glycopeptides that might be indicative of diseased states
Writing manuscripts for peer reviewed publications, presenting results at conferences.
Senior Bioinformatician
InterVenn Biosciences
South San Francisco, CA
12.2021 - 07.2023
Data analysis of patient data generated from internal or external patient samples in targeted assays for various disease conditions including CRC, ovarian cancer, NASH, HCC, pancreatic cancer, Covid
Generating machine learning models for predicting diseased states compared to controls
Software tools used include Python, Scikit-Learn, Numpy, Pandas, Matplotlib, Google Cloud Platform (GCP), git, R (familiarity)
Building and utilizing data analysis pipelines for converting raw experimental mass spectrometry generated data into to a usable format
Generating reports for quality checks of experimental runs and exploratory data analysis
Building and maintaining Google Cloud SQL (MySQL) and Cloud Firestore (no SQL) database on Google Cloud Platform to store data generated from patient samples
Data analyses of discovery glycoproteomics experiments using Proteome Discoverer/Byonic to find glycopeptides that might be indicative of diseased states
Writing manuscripts for peer reviewed publications, presenting results at conferences.
Senior Scientist
InterVenn Biosciences
South San Francisco, CA
11.2018 - 12.2021
Data analysis of patient data generated from internal or external patient samples in targeted assays for various disease conditions including CRC, ovarian cancer, NASH, HCC, pancreatic cancer, Covid
Generating machine learning models for predicting diseased states compared to controls
Software tools used include Python, Scikit-Learn, Numpy, Pandas, Matplotlib, Google Cloud Platform (GCP), git, R (familiarity)
Building and utilizing data analysis pipelines for converting raw experimental mass spectrometry generated data into to a usable format
Generating reports for quality checks of experimental runs and exploratory data analysis
Building and maintaining Google Cloud SQL (MySQL) and Cloud Firestore (no SQL) database on Google Cloud Platform to store data generated from patient samples
Data analyses of discovery glycoproteomics experiments using Proteome Discoverer/Byonic to find glycopeptides that might be indicative of diseased states
Writing manuscripts for peer reviewed publications, presenting results at conferences.
Bioinformatics Scientist
Second Genome
South San Francisco, CA
08.2016 - 05.2018
Building machine learning models using SVM, Random Forest and Logistic Regression to predict protein behavior in biological assays
Software tools used include Python, Scikit-Learn, NumPy, Pandas, Matplotlib, AWS EC2, AWS S3
Built and maintained a MySQL database of environmental genome and metagenomes on AWS Aurora and Athena
Bioinformatics support for various internal projects.
Scientist
Eurofins Lancaster Laboratory (Genentech)
South San Francisco, CA
05.2013 - 06.2015
Early stage molecular assessment of therapeutic antibodies to identify spots of potential modifications in the complementarity determining regions (CDRs)
Used machine learning (ScikitLearn) to predict outcome of experiments based on previous experimental results
Developed a Python/MySQL based GUI/Database tool to store data generated in the group and retrieve it in a meaningful manner
Developed a software tool in Python to calculate the mass of CDR containing tryptic fragments containing sites of potential post-translational modifications and generate automated report template
Peptide mapping to identify spots of potential modifications in the complementarity determining regions (CDRs) using the Orbitrap Elite
Quantification of modified derivatives within the CDRs using XCalibur
Intact mass measurement and characterization of therapeutic antibodies using Agilent HPLC-Chip/Q-TOF.
Post-Doctoral Fellow
Stanford University
Palo Alto
08.2011 - 05.2013
Post-Doctoral Fellow
University Of Washington
Seattle, WA
06.2009 - 11.2009
Education
Doctor of Philosophy (Ph.D.) - Biochemistry and Molecular Biology
University of California
Los Angeles, CA
01.2009
Bachelor of Pharmacy (B.Pharm.) - Pharmaceutical Sciences
Birla Institute of Technology
Mesra, India
01.1999
Skills
Python, Scikit-Learn, Numpy, Pandas, Matplotlib, Google Cloud Platform (GCP), AWS EC2, AWS S3, MySQL, PostgresSQL, git, R (familiarity), mass spectrometry
Awards
ASBMB 2012 Graduate/ Postdoctoral Travel Award - American Society for Biochemistry and Molecular Biology 2012
FDA Commissioner's Fellowship (declined) - Food and Drug Administration 2010
UCLA Fundamental Clinical Research Training Grant (T32 DE007296) - National Institute of dental and craniofacial Research (NIDCR) 2007
CMB Training Grant (Ruth L. Kirschstein National Service Award, GM07185) - National Institute of Health
Timeline
Manager, Biostatistics
InterVenn Biosciences
07.2023 - 09.2024
Senior Bioinformatician
InterVenn Biosciences
12.2021 - 07.2023
Senior Scientist
InterVenn Biosciences
11.2018 - 12.2021
Bioinformatics Scientist
Second Genome
08.2016 - 05.2018
Scientist
Eurofins Lancaster Laboratory (Genentech)
05.2013 - 06.2015
Post-Doctoral Fellow
Stanford University
08.2011 - 05.2013
Post-Doctoral Fellow
University Of Washington
06.2009 - 11.2009
Doctor of Philosophy (Ph.D.) - Biochemistry and Molecular Biology
University of California
Bachelor of Pharmacy (B.Pharm.) - Pharmaceutical Sciences
Birla Institute of Technology
Similar Profiles
Hsuehjou (Shirley) ChenHsuehjou (Shirley) Chen
Senior Production Planner at InterVenn BiosciencesSenior Production Planner at InterVenn Biosciences