Results-driven Data Engineer specializing in performance optimization and ETL pipeline integration. Proficient in Python and Apache Spark, I excel in problem-solving and enhancing data quality. Achieved significant cost savings through efficient job migrations, demonstrating strong analytical and technical skills.
Analysis and Prediction of Graduation Rate and Dropout Rate in New York
State(NumPy,Pandas,Matplotlib,sklearn)
• Collected the csv data on graduation rates and dropout rates in several counties of New York State from 2016 to 2019, pre‑processed the data and analyzed the important metrics that would help in predicting the future graduation rates and dropout rates
• Built an ML model that predicts the counties with high graduation rates and high dropout rates
IMDB Database Management System(SQL)
• Created a IMDB management system to store, transform and search the ratings, crew info of several movies and TV shows released on or before 2017
• Built an Entity‑Relationship Model and translated the model to relational schema for the IMDB management system using SQL
Information Retrieval system Over Twitter Data(Python,AWS,Solr,Flask)
• A complete search and analytic solution for a corpus of twitter data that has been crawled over a period of time along with a user
interface that enables users to search and analyze the corpus and do extensive filtering on tweets based on language, country and
topics