A Data Scientist accomplished at compiling, transforming, and analyzing complex information through software. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems.
British Airways Data Science Job Simulation on Forage
● Scraped and analyzed 3,000 Skytrax reviews using Python, applying topic modeling, sentiment analysis, and word clouds to derive actionable customer experience insights.
● Analyzed 50,000 customer booking entries, engineered new features, and employed Random Forest for predictive modeling, enhancing booking prediction accuracy.
● Translated complex analyses into clear visualizations and a single PowerPoint slide, providing stakeholders with actionable recommendations for informed decision-making.
Detecting Fake News with Python and Machine Learning
● Engineered a Python-based machine learning model that achieved 92% precision in differentiating between fake and real news.
● Analyzed a dataset of over 10,000 news articles to train the model, enhancing its ability to detect key patterns and indicators of misinformation.
● Iteratively refined classification algorithms, leveraging advanced natural language processing techniques, boost detection accuracy by 15%.
Predicting Wine Quality with Statistical Models in Python
● Analyzed a comprehensive dataset of 5,000 wine samples with 12 attributes each, including key metrics like acidity levels, residual sugar, pH, and alcohol content.
● Applied advanced feature engineering techniques, such as polynomial transformations and interaction terms, combined with Lasso and Ridge regression to enhance model performance and reduce overfitting.
● Improved wine quality prediction accuracy from 70% to over 85% through iterative model refinement and strategic feature selection, achieving significant predictive enhancement.
New York City Taxi Fare Prediction with Python
● Engineered a predictive model in Python to estimate taxi fares in New York City.
● Analyzed a dataset of over 1 million taxi trips, considering factors such as distance, duration, pickup/drop-off
locations, and time of day.
● Applied statistical techniques including regression analysis, feature scaling, and cross-validation to optimize model performance and generalization.
● Achieved a prediction accuracy of 90% using machine learning algorithms, surpassing baseline models.
Tata Data Visualization: Empowering Business with Effective Insights Job Simulation on Forage
● Formulated key questions for CEO and CMO, guiding strategic business decisions by analyzing datasets and emphasizing the importance of quantitative and qualitative insights for diverse business needs.
● Executed meticulous data cleanup, reducing a 4.2 million-row dataset to 3.2 million by eliminating invalid entries, ensuring precise analysis and accurate visualizations.
● Developed advanced visualizations using Tableau and Power BI, incorporating Esri's ArcGIS to enhance geographical analysis, providing leadership with actionable insights on sales and revenue distribution.
● Delivered a comprehensive presentation of findings via video, aligning analytical insights with strategic goals, empowering CEO and CMO to make informed, data-driven decisions.
Image Similarity Detection & Classification
● Engineered a Python-based system for image retrieval and classification, integrating Support Vector Machines, Decision Trees, and PPR algorithms to achieve accurate similarity assessments.
● Leverages dimensionality reduction techniques including SVD, PCA, LDH, and k-means to compute top similar images, optimizing computational efficiency and enhancing image similarity analysis.
● Enhanced search performance by implementing hashing techniques to prune search space, resulting in a 96% accuracy rate, optimizing efficiency and accuracy.