Summary
Overview
Work History
Education
Skills
Websites
Publications
Timeline
Generic

Sai Chaithanya Pallaprolu

Senior Data Scientist
Centerton,ar

Summary

Results-driven Data Scientist/Data Engineer with extensive experience in statistical modeling, predictive analytics, and data engineering. Proficient in designing and implementing Multivariate Linear and Logistic Regression, Classification Trees, and Machine Learning algorithms using advanced statistical tools such as Python and cloud platforms like AWS. Highly skilled in writing and optimizing SQL queries to handle complex data extraction and transformation tasks. Adept at delivering end-to-end analytical solutions, including data acquisition, extraction, transformation, and manipulation to ensure data quality and reliability. Experienced in conducting exploratory data analysis (EDA), developing and validating predictive models, and translating data-driven insights into actionable business recommendations. Passionate about leveraging data science to enhance supply chain optimization, demand forecasting, and business decision-making. Expert in Generative AI, Retrieval Augmented Generation (RAG), and Large Language Models (LLMs), enhancing Conversational AI and transformative data insights. Seasoned in crafting and advancing state-of-the-art Generative AI Retrieval Augmented Generation agents and LLM’s, propelling Conversational AI capabilities through efficient data retrieval and transformative insights. Adept in Prompt Engineering, Prompt Governance, Secure & Responsible AI, and harnessing the potential of Large Language Models.

Overview

13
13
years of professional experience

Work History

Sr Data Scientist

Wal-Mart
06.2024 - Current
  • Utilized Python, PySpark, and SQL to analyze large datasets and automate reporting for decision-making.
  • Developed the state-of-the-art Retrieval Augment Generation (RAG) tool utilizing Generative AI GPT, with a focus on incorporating the documents to enable the Conversational AI Chat bot to provide responses sourced from these documents.
  • Optimize document retrieval efficiency by indexing and loading unstructured HTML files using Lang Chain and Azure Search, contributing to the overall effectiveness of the project.
  • Utilized Ragas, Lang Smith to evaluate the performance of the RAG system, ensuring the accuracy and efficiency of the generated responses.
  • Engaged with third-party vendors, to evaluate services and make informed decisions regarding establishing contracts.
  • Environment: Python, OpenAI GPT, NLP, Transformers, Prompt engineering, prompt governance, RAG Architecture, Conversational AI, Generative AI, LLM, Lang chain, Azure Open AI APIM, Azure Cognitive Search, Azure Search, Azure Blob Storage containers.
  • Experimented with various chunking strategies such as semantic chunking, document chunking, and agentic chunking to optimize information retrieval.
  • Experience with large language models (LLMs) including ChatGPT, GPT 3.5, Claude, and Mistral, applying them to enterprise AI solutions.
  • Hands-on with transformer-based architectures such as BERT, RoBERTa, and T5 for NLP and document intelligence.
  • Proficient with big data technologies like Hadoop and Spark, leveraging them for large-scale AI and ML pipelines.
  • Skilled in designing APIs for disseminating AI and ML outputs to multiple business units, ensuring scalability and ease of integration.
  • Mentored junior data scientists, providing guidance on machine learning best practices, code reviews, and AI solution design.
  • Led scaling of production AI systems to handle increased data volumes and user demands while maintaining performance.

Sr Data Scientist

Charger Logistics
03.2024 - 06.2024
  • Established scalable, efficient, automated processes for large-scale data analyses, model development, and validation.
  • Developed lane forecasting models for customers and divisions to support budget planning.
  • Built driver availability models to predict trucks ready for new trips.

Sr Data Scientist

JB Hunt Transportation
08.2020 - 02.2024
  • Performed XGBoost classification model on track acoustic detector data to find good players in alarming and alerting catastrophes such as derailments, bearing failures.
  • End-to-end delivery of supply chain planning products, building advanced forecasting models using Predictive Analytics, Machine Learning, and AI in Python & R.
  • Designed, deployed, and validated machine learning models for demand planning, supply planning, and inventory optimization, improving supply chain efficiency.
  • Designed and deployed conversational AI solutions using Amazon Lex to automate load booking and driver dispatch communications, reducing manual dispatcher workload by 35% and improving driver response times by 28%.
  • Integrated Amazon Lex with telematics and TMS (Transportation Management Systems) to create an intelligent chatbot for drivers, enabling real-time updates on load assignments, route changes, and compliance checks, enhancing operational efficiency across a fleet of 500+ trucks.
  • Developed a Lex-powered virtual assistant for truck drivers to manage trip planning, HOS (Hours of Service) reminders, and equipment maintenance alerts, improving regulatory compliance alerts by 20% and minimizing costly downtime incidents.
  • Built a multilingual Lex chatbot to assist cross-border truck drivers with customs documentation guidance and real-time language support, increasing cross-border delivery success rates and reducing clearance delays by 18%.
  • Applied time series forecasting, clustering, regression, neural networks, and optimization techniques for market mix modeling, price elasticity, and store assortment analysis.
  • Developed segmentation and optimization models to enhance new product planning and market optimization strategies.
  • Built AI-driven anomaly detection models to identify inconsistencies in supply chain transactions, improving operational accuracy.

Data Science Engineer /Senior Systems Developer-Advanced Analytics

National Railroad Passenger Corporation -Amtrak
01.2018 - 07.2020
  • Performed XGBoost classification model on track acoustic detector data to find good players in alarming and alerting catastrophes such as derailments, bearing failures.
  • Worked with marketing teams and produced advanced reports on customer travel behavior such as customer order frequency report, new customer acquisition report, daily sales KPI, Cohort analysis, return purchase by customer cohort, annual purchase frequency by customer cohort.
  • Anomaly detection on wheel impact sensor data to forecast the potential train stops.
  • Calculated customer lifetime value to find the expected number of trips and customer probability of alive.
  • Association rule mining on safety report data to get a clear picture on attribute associations and it helped the safety teams to identify and implement intensive safety measures and training in the associated high-risk areas.
  • Developed a novel three step approach which uses RFM analysis in three mining tasks :clustering, classification and association rule mining applied one after another to adopt different marketing strategies for different customer segments, to predict existing customer behavior in future and associations between the customer segments to implement combined marketing strategies.
  • Pre analyzed the increasing temperature trends recorded using hot box detectors and would notify the mechanical teams if we suspect a train stop or derailment due to overheat bearings in near future.
  • Built a DeepAR forecasting model in Amazon sagemaker to forecast food and beverage spoilage on trains and also the optimal number of train car that are to be scheduled to run for a particular train on a particular day based on past events

2017 Data Science Engineer /Senior Quantitative Analyst

One Main Financial Services
08.2017 - 12.2017
  • Responsible for analyzing data, creating and validating assumptions that fed into volume growth and profitability strategies.
  • Worked with data query tools to build, test, evaluate, and maintain robust data analysis and reporting for management to make timely, informed decisions.
  • Predict the probabilities of customer response for each mail sent using XG Boost model and calculate revenue uplift per target customer.
  • Predict the probabilities of default using neural network models which provides an estimate of the likelihood that a loan borrower will stop paying the installments.

Data Science Engineer /Senior Data Analyst

Maryland Health Benefits Exchange (Thought Layer.Co)
03.2017 - 07.2017
  • Solution development, documentation, and implementation of Data Management, Process Intelligence, Business Intelligence, and Data Science capabilities.
  • Performed data mining techniques on top of MHBE data to find the common discrepancy patterns based on person demographic information.
  • Development of data feeds, sets, structures, coding, and analysis
  • Collaboration with business and development teams and documentation of implemented data structures, models, and approaches.
  • Developed Follow data management best practices and ensure security of data during analysis activities.
  • Data mining and analysis to support ad hoc leadership request or insight.

2017 Research Assistant

DYNAMIC Big Data Mining Research Lab (CHMPR), UMBC
01.2016 - 01.2017
  • I designed and developed a data mining algorithm that deals with issues of supervised learning techniques such as class imbalance problem, lack of labeled data and classification label overlap to improve the classification metrics.
  • Implemented on the fly analytics using spark steaming and deployed an expert system in collaboration with a supervised machine learning technique. Also performed deep packet inspection on suspicious packets in the network data using semantics and reasoning.
  • Developed an unsupervised K means clustering that deals with both categorical and continuous variables and on top of that inspects each cluster, access the cluster and break down the bad cluster in order to decrease sum of squared error. I noticed a low sum of squared by applying my algorithm on MIMIC dataset that is publicly available.
  • Reduced model error by 31% and reduced training time by 45% by using making the algorithms run on parallel machines using apache spark using Py-spark streaming over 150 GB of health data.

Data Science Analyst / Algorithm developer

Syntel Ltd, Research and Development
06.2012 - 04.2015
  • Developed an Artificial Neural Network using a package in R as a part of our company R&D works to determine the prediction of stocks using the concept of back propagation with number of hidden layers =3 and nodes =72.
  • Got a chance to wrangle the JSON data which contains the information about customers and their credit history. I used recursive functions to deal with nested dictionaries in JSON lines and converted the JSON into a clean python data frame and answered several questions posed by clients and got appreciation.
  • Also produced good visualization as per client requirements on credit transaction data. Many challenges were faced while producing dynamic heat maps which exhibits and highlights fraudulent transaction prone areas.

Education

Masters - IS specialized in data science and thesis

UMBC

Bachelors - ECE

Jawaharlal Nehru Technological Univ.

Skills

  • Predictive Analytics & Machine Learning – Expertise in time series forecasting, regression models (linear, logistic), and neural networks to drive data-driven decision-making Hands-on experience in implementing supervised and unsupervised learning algorithms, improving predictive accuracy and business efficiency
  • Supply Chain Forecasting – Designed and deployed demand forecasting models, inventory optimization strategies, and market mix modeling techniques to enhance logistics efficiency and revenue growth Proficient in trend analysis and predictive modeling for supply chain operations
  • Statistical Analysis & Optimization – Strong foundation in descriptive and inferential statistics, hypothesis testing, and optimization algorithms for data-driven problem-solving Applied Monte Carlo simulations, A/B testing, and linear programming to optimize operational performance
  • Cloud Technologies (AWS, Azure, Databricks) – Skilled in cloud-based data engineering, model deployment, and pipeline automation using AWS Sage Maker, Lambda, Redshift, Azure Machine Learning, and Databricks Experience in scaling ML solutions and optimizing computational performance in cloud environments
  • Data Engineering & Big Data Processing – Expertise in PySpark, Hadoop, SQL, and ETL pipeline development to process and analyze large-scale datasets Designed and optimized data lakes, distributed data storage, and real-time data processing workflows
  • Data Visualization & Executive Reporting – Proficient in Power BI, Tableau, Matplotlib, and Seaborn to transform complex datasets into actionable insights Designed interactive dashboards and reports to aid C-level executives in strategic decision-making
  • AI-driven Anomaly Detection & Risk Analysis – Developed anomaly detection models leveraging machine learning and AI to identify fraudulent activities, operational inefficiencies, and financial risks
  • Generative AI & LLM – Expertise in designing and implementing Generative AI models, Retrieval Augmented Generation (RAG) architectures, and large language models to drive conversational AI and innovative data solutions

Publications

  • Sai C. Pallaprolu, et. al, “Label Propagation in Big Data to Detect Remote Access Trojans”, IEEE Big-data conference, Washington DC, INSPEC Accession Number: 16653020, Date Added to IEEE Xplore: 06 February 2017.
  • Sai C. Pallaprolu, et. al, “Iterative Unified Clustering in Big Data”, IEEE Big-data conference, Washington DC, INSPEC Accession Number: 16653148, Date Added to IEEE Xplore: 06 February 2017.
  • Sai C. Pallaprolu, et. al, “Zero-day attack identification in streaming data using semantics and spark” 2017 IEEE International Congress on Big Data (Bigdata Congress), Honolulu, HI, 2017, pp. 121-128.doi: 10.1109/BigDataCongress.2017.25
  • Sai C. Pallaprolu, et. al Anonymization of Network Traces Data through Condensation-based Differential Privacy.

Timeline

Sr Data Scientist

Wal-Mart
06.2024 - Current

Sr Data Scientist

Charger Logistics
03.2024 - 06.2024

Sr Data Scientist

JB Hunt Transportation
08.2020 - 02.2024

Data Science Engineer /Senior Systems Developer-Advanced Analytics

National Railroad Passenger Corporation -Amtrak
01.2018 - 07.2020

2017 Data Science Engineer /Senior Quantitative Analyst

One Main Financial Services
08.2017 - 12.2017

Data Science Engineer /Senior Data Analyst

Maryland Health Benefits Exchange (Thought Layer.Co)
03.2017 - 07.2017

2017 Research Assistant

DYNAMIC Big Data Mining Research Lab (CHMPR), UMBC
01.2016 - 01.2017

Data Science Analyst / Algorithm developer

Syntel Ltd, Research and Development
06.2012 - 04.2015

Bachelors - ECE

Jawaharlal Nehru Technological Univ.

Masters - IS specialized in data science and thesis

UMBC
Sai Chaithanya PallaproluSenior Data Scientist