Accomplished Data Engineer with a passion for delivering valuable insights through advanced data processing and retrieval methods. Committed to driving company growth by developing strategic data pipelines and ETL processes based on robust data architectures. Proven track record in managing complex data sets, ensuring data quality, and serving as a reliable advisor for data-driven decision-making.
Machine Learning on Amazon Customer Reviews:
Collaborated with 3 team members, applied sentiment analysis, and built a classifier that can determine a review'ssentiment.
Conducted thorough exploratory data analysis of the US Traffic Accidents dataset using Pandas and NumPy,identifying key trends and patterns in the data that were previously unknown.
Analyzed and split data into train and test data using Kaggle data set.
Utilized Matplotlib and Scikit-Learn for graphical representation of the data. Key Achievements:• Classified the amazon reviews into positive, negative, and neutral categories to aid the selection of a product.News Classification based on Headline:
Built a cutting-edge NLP model utilizing Python and TensorFlow to classify news articles with 98.99% accuracy, surpassingindustry benchmarks by 20%.
Developed a training set of over 200,000 labeled news articles using active learning techniques, enabling the model tocontinuously improve performance and adapt to evolving topics.
Implemented NLP approach (branch of AI) to develop and train a model to understand, interpret, process, and manipulatenatural language to classify news.
Utilized key techniques like data pre-processing, model training, finding best learning rate, finding best epochs, freezing& transfer learning.Key Achievements:• Analyzed the interest of readers and recommended the news to community newspaper board.Covid-19 Analysis:
• Developed a data pipeline using Python to extract and load COVID-19 data into Google BigQuery, resulting in a 50%May 2022-August 2022increase in data accuracy.
• Initiated and performed exploratory data analysis on the US Traffic Accidents data set to find trends and patterns using Python and Jupiter.
• Provided the analyzed data to counties and local hospitals to better assess and judge the need of medical supplies.
Exploratory Data Analysis of US Traffic Accidents:
• Explained the Data trends using data visualization python libraries through graphical plots.