Projects:
1) Email Analysis Automation and Optimization System (EAAS Tool) (November 21 - May'23)
- Aimed at automating and optimizing the process of entry of IMS cases from Outlook to the database which will help in addressing and resolving the issues quickly.
- Created an automated tool that would read emails, PDF's which are medical records, extract information from them, analyze the content using natural language processing (NLP) methods like sentiment analysis, topic modeling, and named entity recognition, and then reply to the sender in one click.
- Carried out data transformation and cleansing to extract precise information about every email that was absent from the conventional approach. Consolidated a huge amount of unstructured data into structured data and saved it all in a backend database.
- The tool can process a large number of emails from outlook and pass it on issue resolution team within minutes whereas the turnaround time in the traditional process used to be 2 to 3 working days. Includes savings of $200K per year.
Tools and Technology: NLP, Python, SQL in Oracle, Excel.
2) Collections Forecast using Time Series Forecasting (Using ARIMA and RNN) (March 20 - May 23)
- Aimed at building a forecasting model that forecasts the collection amount for a month each day.
- Developed a time-series forecasting model that predicts daily collections amount (ranges from $180M to $200M each month) for budgeting using Arima and RNN models.
- The model was responsible for testing the stationarity of the data through the Augmented Dickey-Fuller (ADF) test and Kwiatkowski-Philips-Schmidt-Shin (KPSS), handling outliers and missing values, formatting the data into uniform time intervals.
- Included additional features into the model which improves the accuracy and interpretability of the model using techniques such as lag features, rolling statistics and exponential smoothing. Conducted rigorous model evaluation and validation, using mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).
- Tracked the performance of the model and provided Root Cause Analysis each week if there were any variances more than $2M. The model performed with an average accuracy of 96% each year.
Tools and Technology: Time Series - ARIMA and RNN in RStudio, SQL in Oracle, Tableau, Excel.
3) MTV To CAS Transfers (May 21 - Nov 21)
- The objective is to automate and optimize the transfer process of the claims from the MTV platform that were not getting collected to the CAS platform to increase the collection process.
- Designed and developed a Linear Optimization Model in RStudio which maximized the total collection amount.
- Implemented Survival Analysis to prioritize the claims based on the time left for the Claim to survive on the MTV platform and the $ amount to be collected. Demonstrated the outcome of survival analysis to the stakeholders which facilitated them to take quick actions on the Claims that had a lower possibility of survival rate and high $ amount.
- Conducted rigorous testing and validation of the Claim transfer process, including data reconciliation, system integration testing, and client acceptance testing to verify the accuracy of the transferred claims and the total collected amount.
- The model facilitated the client to reduce 99% of the effort and 93% of the project cost. Collections of $5M from the model in the year 2021 and $2M and $1.5M from 2022 and 2023 respectively.
Tools and Technologies used: Linear Optimization algorithm and Survival Analysis in RStudio, SQL in Oracle, Tableau, and Excel.
Achievements: Received “Excellence in Execution” award for reducing the client effort by 99% and cutting down 93% of the project cost.
4) Write off’s Decision Tree (June 21 - Feb 22)
- Predict the Claims that might enter into write-offs which will help the finance team identify and prevent them from going to write-offs and prioritize the claims in terms of the high $ amount that can be collected.
- Developed and implemented data science prediction models like Random Forest, AdaBoost, XGBoost, Naïve Bayes, and SVM to predict which claims are likely to go to write-offs.
- Conducted exploratory data analysis (EDA) and preprocessing to identify relevant features and patterns, performed data cleaning process which includes dealing with missing values, handling imbalanced data, and dealing with outliers.
- Implemented feature selection and feature engineering techniques to extract and create new features which increased the accuracy of the model and scaled the data using the normalization method.
- Evaluated the model and calculated the metrics such as accuracy, precision, recall, and F1 score. The model helps in saving around $45K - $50K per month with an accuracy of 95%- 98%
Tools and Technologies used: Random Forest, AdaBoost, XGboost, Naïve Bayes, SVM, Decision Tree in R, SQL in Oracle, Excel.
5) Vendor Letter Finding (Aug 19 - March 20)
- Classify if the claim is overpaid or not, by analyzing the letters sent to vendors to reduce the cost spent on vendors.
- Developed an algorithm that extracted all the data from PDFs converted it into structured data and stored it in a table with 130 columns. Utilized web scraping techniques to gather essential data from online sources. With the help of Principal Component Analysis (PCA) and EDA relevant features required for the model were extracted.
- Handled the imbalanced data using techniques like Smote and Ensemble techniques.
- Performed a detailed analysis of the data, drew useful insights that were critical, and shared the insights with the clients, and business heads which facilitated them take crucial business steps.
- Employed machine learning algorithms like Random Forest, XGBoost, Adaboost, and SVMs to classify the claims.
Tools and technologies used: Random forest, Adaboost, XGboost, Naïve Bayes, SVM, NLP, text mining, web scrambling in Python, Excel, Statistical testing