Summary
Overview
Work History
Education
Skills
Timeline
Generic
Arindam Guptaray

Arindam Guptaray

San Diego,CA

Summary

  • Expert in creating and deploying predictive models using Machine Learning algorithms including Neural Networks, KMeans Clustering, Logistic Regression, and Decision Trees
  • Proven track record of successful implementation of AI driven solutions for various industry segments
  • Experience with deep learning frameworks such as TensorFlow, Keras, and PyTorch
  • Experience with Large Language Models (LLM)
  • Knowledge of natural language processing (NLP), computer vision, and image processing algorithms
  • Data Governance and Data Quality Expert

Overview

20
20
years of professional experience

Work History

Sr. Data Scientist

USDA
01.2024 - Current

Created a time series analysis of Crop data using python.

Created a Tableau Dashboard to monitor ETL progress and activity

Created an Audit program using python to capture file sizes and compare with inserted records in the database.


Data Scientist

Dept of Energy
01.2023 - 12.2023

Created a Power BI dashboard to capture different Risk management pillars for DOE's equipment.

Created a python program to capture generation data.

Created an optimization program to prioritize work orders.

Sr. Data Scientist

Experian
06.2018 - 01.2023

Design a framework for automated and efficient modeling process to predict credit risk. Framework was developed on AWS using Spark, S3 and EC2.

Used Xgboost, SQL, and Stats and Machine Learning to predict the probability of Fraud using Pyspark and Sagemaker. The KS of the model was 20% better than the KS of the existing model. The model was later converted to a Deep Learning Network using Keras. Final model was implemented using PyTorch. Deployed the model as REST API using Flask and tested with Postman. Used Docker to containarize the deployment.

Created a meta data repository and metrics for Data Quality Analysis. 

Created a chatbot to answer questions about the above framework using LangChain, Pinecone, HuggingFace and Llama.

Sr. Data Scientist

Thermo Fisher Scientific
05.2017 - 06.2018

Lead the creation, deployment and Adoption of Cross-Sell, Retention and Market Basket Models across the organization in Spark using Pyspark, SQL and Scikit-Learn. Used the Databricks platform. The lift from the model was over 40%.

Created reports in Tableau and PowerBI to report Lift from Predictive Campaigns. Forecasted the Cross-Sell and Retention Revenue.

Sr. Data Scientist

Silver Springs Network
06.2015 - 05.2017

Designed a Time Series Model (ARIMA) in Python to predict substation usage based on historical usage and temperature and humidity. Converted the model to use RNN Deep learning using Databricks.

Designed deep learning model using Redshift, Glue and SageMaker to predict commercial accounts with high probability of committing fraud in Electric and Gas domain. The data was captured using Kinesis Stream. Created a Fast API. Deployed using Docker and Kubernete.

Created a Computer Vision Model on Sagemaker using CNN to identify LED Lighting from Google Images. 

Sr. Data Scientist

Intuit
07.2014 - 06.2015

Designed a model in Python to determine whether a visitor to the web site will finally purchase the tax product based on his activity in the first session.  Created a Tableau dashboard to predict trends and display metrics related to this model. 

Designed a model in Python’scikit-Learn to predict fraudulent credit card charges.

Did A/B Testing for Promotions.

CTO

Moodys India (ICRA)
06.2011 - 06.2014

Directly interacted with senior management of British Petroleum. Led a team of 35 analysts and programmers. Successfully sold data science projects to British Petroleum and to other clients in California, USA. 

Designed a model in Python to predict the UNSPSC category of spend data using a model based on Naïve Bayesian. The model was trained using 5 years of data (100 million records). The forecasted category was accurate with high confidence 92% (vs 93% manual) of the time and was accurate 98% (when selecting the second probable outcome). This reduced manual classification effort by 90%. The system detected rules that were incorrect and reduced redundant rules. This system was implemented using Hadoop streaming in a Cloudera cluster. 

Designed a Python(using Numpy, NLTK, Matplotlib) based application to extract the document type (Contract, Change Request, Call Offs, etc.) from 10000 pdf documents. The application accurately classified 98% of the documents and also detected that 2% of the original classification was incorrect. The supplier name was extracted from the agreement using NGrams and mapped to existing suppliers. 

Created a regression model in Python that would estimate the FICO score based on revolving credit, no. of credit card rejections, annual income, housing loans, auto loans. 

Reduced the number of redundant suppliers by comparing suppliers using text analytics (string distance). Suppliers that differ by small distances were grouped together. This list was verified by using Google API to get the correct supplier name. Used Python and NLTK toolkit

Created a Twitter Sentiment Analysis using RNN and LSTM in Python for a large corporation.

Chief Architect

Connectiva Systems
06.2004 - 06.2011

Worked with the users to determine the key RA KPIs for Wireless business units using COSA’s Risk and Control. Implemented the KPIs on a DB2 database. The database size was over 1000 TB. Designed the physical and logical star schema datawarehouse. Migrated the data structure from DB2 to Oracle. 

Implemented Fraud Alarms for the operator that enabled them to catch a big fraud operation. Architected the complete fraud solution

Education

MBA - Finance

University of Minnesota
Minneapolis, MN

MS - Computer Science

Wayne State University
Detroit, MI

BS - Computer Engg

IIT
Kharagpur

Skills

  • Language: SQL, Python, R, PySpark, Scala,Pearl, Flask
  • Visualization: Qlikview, PowerBI , Tableau, MatPlotLib, Seaborn
  • Database: Oracle, SQLServer, Postgress, MySQL, Hive
  • Tools: SageMaker, Glue, RedShift, Spark, Docker, Kubernetes, Numpy, Scipy, Scikit Learn, Databricks
  • Algorithms: Keras, XgBoost, Deep Learning,TensorFlow, Regression, Logistic Regression,Decision Tress,Random Forest, ARIMA, Gradient Boosting, Support Vector Machine, PyTorch, NLTK, A/B Testing, Bayesian Stats,RNN, LSTM, GRU, CNN, Spacy, OpenAi, LLama, HuggingFace, LangChain, PineCone, ChromeDB,

Timeline

Sr. Data Scientist

USDA
01.2024 - Current

Data Scientist

Dept of Energy
01.2023 - 12.2023

Sr. Data Scientist

Experian
06.2018 - 01.2023

Sr. Data Scientist

Thermo Fisher Scientific
05.2017 - 06.2018

Sr. Data Scientist

Silver Springs Network
06.2015 - 05.2017

Sr. Data Scientist

Intuit
07.2014 - 06.2015

CTO

Moodys India (ICRA)
06.2011 - 06.2014

Chief Architect

Connectiva Systems
06.2004 - 06.2011

MBA - Finance

University of Minnesota

MS - Computer Science

Wayne State University

BS - Computer Engg

IIT
Arindam Guptaray