Summary

Overview

Work History

Education

Skills

Timeline

Jyothika Vollireddy

Arlington,Texas

Summary

Technical expertise with data models, data mining, and segmentation techniques. Expertise in integrating machine learning solutions with Hadoop, Spark, Python, SQL, Ab Initio, AWS. Processed the data from Kafka pipelines from topics and show the real-time streaming in dashboards. Experienced in leveraging AI & Machine learning capabilities in smart factories to provide hyper automation and real time analytics. Working knowledge on various deployment tools like docker, Kubernetes and Jenkins. Develop & Implement NLP models (topic modeling, semantic search, Q&A answering, chatbots, etc.) for real-time inference on text data. Delivered multiple ML solutions in Cloud for the Sales and Marketing team, that directly contributed to the top-line growth of the business.

Overview

years of professional experience

Work History

Data Scientist

State Farm

01.2020 - Current

Designed and implemented intelligent document retrieval systems using OpenAI (GPT-4), LangChain, and vector databases (Pinecone/FAISS), improving answer accuracy and reducing latency by ~40%.
Developed secure, scalable Retrieval-Augmented Generation (RAG) pipelines integrated with Unity Catalog and Delta Lake on Databricks, ensuring governed LLM access to sensitive enterprise data.
- Embedding & Indexing at Scale: Automated embedding generation, chunking, and indexing using PySpark for large-scale ingestion across distributed systems.
- Platform Migration for Fortune 500 Clients: Led enterprise migrations from Snowflake to Databricks Lakehouse architecture, reducing data platform costs by 30–50% and improving execution times.
- Client Accelerator Program: Delivered LLM-based use cases through the 'GenAI POC in a Month' accelerator, rapidly turning prototypes into production-grade solutions.
Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards.
Implemented latest machine learning techniques Light GBM, Pycaret to identify meaningful patterns and for predictive modelling.
Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Ridge Regression, K - Nearest Neighbors, Decision Trees, Bagging, Boosting Random Forest, Support Vector Machine, Bayesian, Gradient Boosting, XGBoost, Neural Network, Clustering in Predictive Analytics, Segmentation Methodologies, Regression-Based Models, Factor Analysis, PCA, and Ensembles.
Developed Predictive Analytics using Pyspark and Spark SQL on Databricks to extract, transform and uncover insights from the raw data.
Involved in Data ingestion to Azure Data Lake, Azure Databricks by building pipelines in Azure Data Factory.
Led end-to-end design and implementation of Enterprise customers' Generative AI applications on Databricks, OpenAI (GPT-4), and LangChain to develop intelligent document Q&A systems.
Developed scalable RAG (Retrieval-Augmented Generation) pipelines based on vector databases (Pinecone/FAISS), reducing document retrieval latency by ~40% and boosting answer accuracy.
Developed customized embedding workflows and chunking strategies to optimize semantic search within large enterprise document repositories (PDFs, DOCX, HTML).
Databricks notebooks and MLflow-based orchestrated workflows for experimentation, tracking, and model deployment.
Role-based access controls, prompt engineering methods, and metadata filters integrated to offer secure, context-specific responses.
Cross-functional collaboration with data scientists, platform engineers, and client stakeholders to ensure GenAI capabilities aligning with business goals.
Built and deployed Retrieval-Augmented Generation (RAG) pipelines on Databricks, driven by Delta Lake and Unity Catalog to support governed and scalable LLM-based assistants.

Data Scientist

Wells Fargo

07.2017 - 12.2019

Allowed governed data access and lineage tracking by configuring Unity Catalog for fine-grained permissions and auditing across sensitive enterprise data sets.
Used Delta Lake's ACID-compliant storage for versioned, stable document sources used in LLM context retrieval pipelines.
Integrated OpenAI and LangChain to build intelligent assistants with dynamic, query-aware document retrieval and context injection capabilities.
Automated embedding generation, chunking, and vector indexing operations using PySpark for horizontally scalable data processing and ingestion.
Ensured compliance with enterprise security standards by using encryption, data masking, and access control policies applied across all RAG pipeline layers.
Led large-scale data platform migration of multiple Fortune 500 organizations from Snowflake to Databricks Lakehouse architecture, improving performance and lowering platform costs by 30–50%.
- Data Platform Re-architecture: Migrated large-scale data workloads from Snowflake to Databricks using Delta Lake and Apache Spark, achieving 40% faster pipeline execution.
- MLOps Automation: Built CI/CD-integrated ML pipelines using MLflow, Databricks Workflows, and GitHub Actions to support automated model retraining and deployment.
- Mentorship & Delivery Excellence: Mentored junior engineers and contributed to company-wide best practices in Databricks, Spark, and MLOps. Played a key role in the organization earning 4× 'Elite Databricks Partner' status.
Re-architected ETL pipelines using Apache Spark and Delta Lake in order to achieve high-throughput, ACID-compliant data processing.
Created scalable, tunable data workflows using Databricks Workflows, Airflow, and DBT, reducing pipeline execution time by an average of 40%.
Migrated Snowflake schemas, stored procedures, and user access controls to Unity Catalog to maintain data governance and lineage.
Optimized and benchmarked query performance post-migration, achieving significant acceleration in data reading and analytics workloads.
Collaborated with cross-functional teams (data engineering, analytics, security) to enable smooth cutover and post-migration validation.
Guided junior engineers and cross-functional teams for Databricks best practices, Spark optimization, and MLOps workflows and accelerated onboarding and project ramp-up by ~50%.

Data Analyst

Tech Mahindra

01.2017 - 07.2017

Completed code review and hands-on training sessions to improve team capabilities in data engineering, machine learning, and cloud architecture.
Acted as the technical liaison between the client and internal teams to ensure coordination with Databricks solution architectures and implementation standards.
Directly contributed to the company's 4× 'Elite Databricks Partner' status through the promotion of delivery excellence on many high-impact projects.
Created reusable templates, CI/CD workflows, and knowledge artifacts to institutionalize quality and improve time-to-delivery for new projects.

Education

Master of Science - Cybersecurity

University of North Texas

Denton, TX

Skills

OpenAI (GPT-4)
LangChain
Retrieval-Augmented Generation (RAG)
Prompt Engineering
Embedding Models
Supervised Learning
Unsupervised Learning
Feature Engineering
Model Evaluation
Hyperparameter Tuning
Time Series Forecasting
Scikit-learn
XGBoost
LightGBM
MLflow
Databricks Workflows
CI/CD (GitHub Actions)
Model Registry
Docker
Streamlit
RESTful APIs
Apache Spark
Delta Lake
PySpark
Unity Catalog
ETL/ELT Development
DBT
Data Quality & Validation
Pinecone

FAISS
Chroma
ANN Indexing
Semantic Search
Pandas
NumPy
SciPy
Matplotlib
Seaborn
Plotly
Experiment Design
A/B Testing
Databricks (AWS/Azure)
Snowflake
Azure Data Lake
Amazon S3
SQL
Python
Git
Jupyter Notebooks
VS Code
Postman
Bash
Cross-functional Team Leadership
Client Engagement
Technical Mentorship
Agile/Scrum
Documentation & Reporting

Timeline

Data Scientist

State Farm

01.2020 - Current

Data Scientist

Wells Fargo

07.2017 - 12.2019

Data Analyst

Tech Mahindra

01.2017 - 07.2017

Master of Science - Cybersecurity

University of North Texas

Jyothika Vollireddy

Summary

Overview

Work History

Data Scientist

Data Scientist

Data Analyst

Education

Master of Science - Cybersecurity

Skills

Timeline

Data Scientist

Data Scientist

Data Analyst

Master of Science - Cybersecurity

Similar Profiles

ALBERTO ESCOBAR RUIZALBERTO ESCOBAR RUIZ

Bethany GandyBethany Gandy

Hadiya HullHadiya Hull

Dedra ClayDedra Clay

Rodney E. Williams SR.Rodney E. Williams SR.