Retrieval-Augmented Generation (RAG) for Educational Q&A – Independent AI Project
George Mason University | Jan 2023 – May 2023
Collaborated with MITRE Corporation to support the U.S. State Department in assessing national security threats.
- Collected and analyzed U.S. farmland ownership data by foreign entities across states.
- Correlated land holdings with proximity to sensitive infrastructure (e.g., military bases, research institutions).
- Identified high-risk ownership patterns possibly linked to IP theft and foreign surveillance.Tools: SQL, R, GIS, ExcelImpact: Informed risk-monitoring and policy discussions on foreign agricultural investment.
George Mason University | Spring 2023
Analyzed historical Major League Baseball (MLB) data to predict match outcomes based on engineered features.
- Cleaned and merged game logs, team stats, and player performance data.
- Engineered custom features including pitcher fatigue, batting order strength, home field advantage, and historical win streaks.
- Trained classification models (e.g., logistic regression, random forest) to predict win/loss outcomes.
- Evaluated accuracy, precision, and AUC; interpreted model outputs and feature importance.Tools: R (tidyverse, caret), SQL, Baseball Reference datasetsImpact: Demonstrated the role of engineered domain-specific features in improving predictive accuracy of sports outcomes.
George Mason University | Sep 2023 – Present
Supported PhD students in applying causal inference techniques to real-world healthcare datasets using SQL and R. Projects included:
- Propensity Score Matching
- IPTW (Inverse Probability of Treatment Weighting)
- Stratified Estimation
- Propensity-Weighted Regression
- Negative Control SimulationsTools: PostgreSQL, R (MatchIt, survey, ggplot2), EHR data
Tools & Languages: SQL, R, Python, Excel, GIS tools, caret, MatchIt, xgboost, randomForest, ggplot2, Tableau
Personal Project | 2024
Designed and deployed a Retrieval-Augmented Generation (RAG) model to teach general knowledge to children via natural language Q&A.
- Integrated LLMs (OpenAI/GPT) with a retrieval pipeline to pull contextually relevant facts from a curated knowledge base.
- Fine-tuned system prompts and response filtering to ensure age-appropriate, engaging educational output.
- Evaluated system via feedback from learners and educators.Tools: Python, HuggingFace Transformers, FAISS, LangChainImpact: Built a proof-of-concept AI tutor for self-guided child learning and curiosity-based Q&A.