Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic
Khanh Nguyen

Khanh Nguyen

Houston,TX

Summary

Database Structures & Algorithm Data Mining Data Visualization Statistical Analysis Machine Learning Predictive Modeling Driven Data Science Intern ready to thrive in demanding digital intelligence processing environments. Well-informed on latest machine learning advancements. Ready to combine tireless hunger for new skills with desire to exploit cutting-edge data science technology.

Overview

1
1
year of professional experience

Work History

Research Assistant

The Trawick Lab - Baylor University
Waco, TX
08.2021 - 08.2022
  • Prepared reagents and solutions following standard laboratory formulas and procedures.
  • Recorded experimental findings in laboratory notebook to analyze and interpret experimental finding
  • Assisting in drug discovery through process of molecular docking
  • Analyzed result docking models of protein-ligand complexes and numerical values

Education

M.S - Biomedical Informatics

University of Texas Health Science Center Houston
Houston, TX
05.2024

B.S - Biochemistry

Baylor University
Waco, TX
08.2022

Algorithms For DNA Sequencing -

John Hopkins University
Coursera Certificate

Skills

  • Proficient in SQL and Python, particularly in data cleaning, manipulation, and analysis
  • Expertise in retrieving, processing, and analyzing clinical data, including skills in data mining, machine learning, and statistical analysis
  • Modeling, statistical inference, multivariate regression, classification, and pattern recognition
  • Experience in creating dashboards and visual analytical presentations using Tableau and Power BI
  • Familiarity with Jupyter Notebook, Python libraries such as scikit-learn and Keras that are commonly used for math and statistics in machine learning and data analysis
  • Strong knowledge of database structures and algorithms, with experience in data visualization, statistical analysis,
  • Produce Clean Code
  • Unstructured Data
  • Proficient with Microsoft Office 365 Suite of products (eg, Word, Excel, and PowerPoint)
  • Strong written and verbal communication skills, effectively conveying complex data-driven insights to audiences
  • Data Quality Assurance Processes

Projects

Analyzing and Improving Classification Models for Breast Cancer Diagnosis

  • Preprocessed and cleaned the data for modeling
  • Applied a range of classification models including Logistic Regression, Naive Bayes, SVM, Random Forest
  • Bagging classifiers were used to improve the accuracy of the models, particularly for Decision Trees, Random Forest, Logistic Regression, and SVM.
  • A voting ensemble model was implemented, which combined the output of the best-performing bagging classifiers and the CatBoost to further boost the overall accuracy of the model to 83%

Linear Regression Model: Using Machine Learning to Predict Taxi Fare Amount

  • Used Pandas data frame to resolve missing values, remove unwanted values, and convert categorical columns
  • Extracted sample data from a public BigQuery dataset NYC Taxi cab ride using SQL, store the results in Pandas
  • Created a Seaborn plot to visualize and explore the data
  • Performed linear regression model training and RMSE model evaluation using scikit-learn

Decision Tree Diagnosis: Using Machine Learning to Identify At-Risk Patients.

  • Used Pandas data frame to resolve missing values, remove unwanted values, and convert categorical columns to “one-hot encodings”
  • Performed model training and evaluation, feature selection to build a machine-learning model for predicting the risk of readmission for patients with heart failure
  • Used a decision tree classifier to fit the training data and achieved a testing accuracy of 91.8% and a training accuracy of 94.9%

Data Analysis of Prevalence and Medicare Payment for Chronic Conditions in Four States and National Level

  • Cleaned and preprocessed CMS data to remove duplicates, handle missing values, and standardize variables.
  • Conducted data analysis using CMS data to determine the prevalence of chronic conditions in the United States and four selected states.
  • Analyzed Medicare payments for chronic conditions in the United States and four selected states
  • Stratified prevalence rates of chronic conditions by gender and race for the United States and each state.

Timeline

Research Assistant

The Trawick Lab - Baylor University
08.2021 - 08.2022

M.S - Biomedical Informatics

University of Texas Health Science Center Houston

B.S - Biochemistry

Baylor University

Algorithms For DNA Sequencing -

John Hopkins University
Khanh Nguyen