Detail-oriented Data Analyst with hands-on experience in data analysis, visualization, and statistical modeling. Proficient in SQL, Python (Pandas, NumPy), Excel, and Tableau/Power BI for extracting insights and driving data-driven decisions. Skilled in data cleaning, trend analysis, and hypothesis testing with a strong foundation in machine learning models and predictive analytics. Passionate about translating complex datasets into actionable insights to support business goals. Strong analytical mindset with excellent problem-solving and communication skills.
Work History
Analyzing Trends in Youth for Tobacco Use
University Of North Texas
08.2024 - 12.2024
Conducted in-depth statistical analysis on the Youth Tobacco Survey (YTS) dataset (10,600 observations, 31 variables) to track youth tobacco use trends from 1999-2017.
Utilized machine learning models (Random Forest, Decision Tree, Linear Regression, and Support Vector Regression) to predict tobacco use patterns, with Random Forest achieving the highest R² score of 0.800.
Applied data preprocessing techniques, including missing value imputation, feature selection, and standardization, to ensure accurate model training.
Developed data visualizations using Matplotlib and Seaborn to identify trends in tobacco use, cessation attempts, and demographic influences.
Conducted hypothesis testing to evaluate the impact of public health policies on smoking rates among U.S. middle and high school students.
Data Analysis & Visualization: Pandas, NumPy, Matplotlib, Seaborn.