Created a time series analysis of Crop data using python.
Created a Tableau Dashboard to monitor ETL progress and activity
Created an Audit program using python to capture file sizes and compare with inserted records in the database.
Created a Power BI dashboard to capture different Risk management pillars for DOE's equipment.
Created a python program to capture generation data.
Created an optimization program to prioritize work orders.
Design a framework for automated and efficient modeling process to predict credit risk. Framework was developed on AWS using Spark, S3 and EC2.
Used Xgboost, SQL, and Stats and Machine Learning to predict the probability of Fraud using Pyspark and Sagemaker. The KS of the model was 20% better than the KS of the existing model. The model was later converted to a Deep Learning Network using Keras. Final model was implemented using PyTorch. Deployed the model as REST API using Flask and tested with Postman. Used Docker to containarize the deployment.
Created a meta data repository and metrics for Data Quality Analysis.
Created a chatbot to answer questions about the above framework using LangChain, Pinecone, HuggingFace and Llama.
Lead the creation, deployment and Adoption of Cross-Sell, Retention and Market Basket Models across the organization in Spark using Pyspark, SQL and Scikit-Learn. Used the Databricks platform. The lift from the model was over 40%.
Created reports in Tableau and PowerBI to report Lift from Predictive Campaigns. Forecasted the Cross-Sell and Retention Revenue.
Designed a Time Series Model (ARIMA) in Python to predict substation usage based on historical usage and temperature and humidity. Converted the model to use RNN Deep learning using Databricks.
Designed deep learning model using Redshift, Glue and SageMaker to predict commercial accounts with high probability of committing fraud in Electric and Gas domain. The data was captured using Kinesis Stream. Created a Fast API. Deployed using Docker and Kubernete.
Created a Computer Vision Model on Sagemaker using CNN to identify LED Lighting from Google Images.
Designed a model in Python to determine whether a visitor to the web site will finally purchase the tax product based on his activity in the first session. Created a Tableau dashboard to predict trends and display metrics related to this model.
Designed a model in Python’scikit-Learn to predict fraudulent credit card charges.
Did A/B Testing for Promotions.
Directly interacted with senior management of British Petroleum. Led a team of 35 analysts and programmers. Successfully sold data science projects to British Petroleum and to other clients in California, USA.
Designed a model in Python to predict the UNSPSC category of spend data using a model based on Naïve Bayesian. The model was trained using 5 years of data (100 million records). The forecasted category was accurate with high confidence 92% (vs 93% manual) of the time and was accurate 98% (when selecting the second probable outcome). This reduced manual classification effort by 90%. The system detected rules that were incorrect and reduced redundant rules. This system was implemented using Hadoop streaming in a Cloudera cluster.
Designed a Python(using Numpy, NLTK, Matplotlib) based application to extract the document type (Contract, Change Request, Call Offs, etc.) from 10000 pdf documents. The application accurately classified 98% of the documents and also detected that 2% of the original classification was incorrect. The supplier name was extracted from the agreement using NGrams and mapped to existing suppliers.
Created a regression model in Python that would estimate the FICO score based on revolving credit, no. of credit card rejections, annual income, housing loans, auto loans.
Reduced the number of redundant suppliers by comparing suppliers using text analytics (string distance). Suppliers that differ by small distances were grouped together. This list was verified by using Google API to get the correct supplier name. Used Python and NLTK toolkit
Created a Twitter Sentiment Analysis using RNN and LSTM in Python for a large corporation.
Worked with the users to determine the key RA KPIs for Wireless business units using COSA’s Risk and Control. Implemented the KPIs on a DB2 database. The database size was over 1000 TB. Designed the physical and logical star schema datawarehouse. Migrated the data structure from DB2 to Oracle.
Implemented Fraud Alarms for the operator that enabled them to catch a big fraud operation. Architected the complete fraud solution