Analyzing Trends in Youth for Tobacco Use
University Of North Texas
Denton, TX
08.2024 - 12.2024
- Conducted in-depth statistical analysis on the Youth Tobacco Survey (YTS) dataset (10,600 observations, 31 variables) to track youth tobacco use trends from 1999-2017.
- Utilized machine learning models (Random Forest, Decision Tree, Linear Regression, and Support Vector Regression) to predict tobacco use patterns, with Random Forest achieving the highest R² score of 0.800.
- Applied data preprocessing techniques, including missing value imputation, feature selection, and standardization, to ensure accurate model training.
- Developed data visualizations using Matplotlib and Seaborn to identify trends in tobacco use, cessation attempts, and demographic influences.
- Conducted hypothesis testing to evaluate the impact of public health policies on smoking rates among U.S. middle and high school students.
- Data Analysis & Visualization: Pandas, NumPy, Matplotlib, Seaborn.
- Machine Learning & Modeling: Scikit-learn, Regression models, PCA.
- Data Preprocessing: Handling missing values, feature scaling, categorical encoding.
- Statistical Methods: Hypothesis testing, correlation analysis.
- Survey Data Handling: Experience working with CDC’s Youth Tobacco Survey (YTS) dataset.
