Summary
Overview
Work History
Education
Skills
Timeline
Generic

Jyothika Vollireddy

Arlington,Texas

Summary

  • Technical expertise with data models, data mining, and segmentation techniques. Expertise in integrating machine learning solutions with Hadoop, Spark, Python, SQL, Ab Initio, AWS. Processed the data from Kafka pipelines from topics and show the real-time streaming in dashboards. Experienced in leveraging AI & Machine learning capabilities in smart factories to provide hyper automation and real time analytics. Working knowledge on various deployment tools like docker, Kubernetes and Jenkins. Develop & Implement NLP models (topic modeling, semantic search, Q&A answering, chatbots, etc.) for real-time inference on text data. Delivered multiple ML solutions in Cloud for the Sales and Marketing team, that directly contributed to the top-line growth of the business.

Overview

8
8
years of professional experience

Work History

Data Scientist

State Farm
01.2020 - Current
  • Designed and implemented intelligent document retrieval systems using OpenAI (GPT-4), LangChain, and vector databases (Pinecone/FAISS), improving answer accuracy and reducing latency by ~40%.
  • Developed secure, scalable Retrieval-Augmented Generation (RAG) pipelines integrated with Unity Catalog and Delta Lake on Databricks, ensuring governed LLM access to sensitive enterprise data.
  • - Embedding & Indexing at Scale: Automated embedding generation, chunking, and indexing using PySpark for large-scale ingestion across distributed systems.
  • - Platform Migration for Fortune 500 Clients: Led enterprise migrations from Snowflake to Databricks Lakehouse architecture, reducing data platform costs by 30–50% and improving execution times.
  • - Client Accelerator Program: Delivered LLM-based use cases through the 'GenAI POC in a Month' accelerator, rapidly turning prototypes into production-grade solutions.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards.
  • Implemented latest machine learning techniques Light GBM, Pycaret to identify meaningful patterns and for predictive modelling.
  • Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Ridge Regression, K - Nearest Neighbors, Decision Trees, Bagging, Boosting Random Forest, Support Vector Machine, Bayesian, Gradient Boosting, XGBoost, Neural Network, Clustering in Predictive Analytics, Segmentation Methodologies, Regression-Based Models, Factor Analysis, PCA, and Ensembles.
  • Developed Predictive Analytics using Pyspark and Spark SQL on Databricks to extract, transform and uncover insights from the raw data.
  • Involved in Data ingestion to Azure Data Lake, Azure Databricks by building pipelines in Azure Data Factory.
  • Led end-to-end design and implementation of Enterprise customers' Generative AI applications on Databricks, OpenAI (GPT-4), and LangChain to develop intelligent document Q&A systems.
  • Developed scalable RAG (Retrieval-Augmented Generation) pipelines based on vector databases (Pinecone/FAISS), reducing document retrieval latency by ~40% and boosting answer accuracy.
  • Developed customized embedding workflows and chunking strategies to optimize semantic search within large enterprise document repositories (PDFs, DOCX, HTML).
  • Databricks notebooks and MLflow-based orchestrated workflows for experimentation, tracking, and model deployment.
  • Role-based access controls, prompt engineering methods, and metadata filters integrated to offer secure, context-specific responses.
  • Cross-functional collaboration with data scientists, platform engineers, and client stakeholders to ensure GenAI capabilities aligning with business goals.
  • Built and deployed Retrieval-Augmented Generation (RAG) pipelines on Databricks, driven by Delta Lake and Unity Catalog to support governed and scalable LLM-based assistants.

Data Scientist

Wells Fargo
07.2017 - 12.2019
  • Allowed governed data access and lineage tracking by configuring Unity Catalog for fine-grained permissions and auditing across sensitive enterprise data sets.
  • Used Delta Lake's ACID-compliant storage for versioned, stable document sources used in LLM context retrieval pipelines.
  • Integrated OpenAI and LangChain to build intelligent assistants with dynamic, query-aware document retrieval and context injection capabilities.
  • Automated embedding generation, chunking, and vector indexing operations using PySpark for horizontally scalable data processing and ingestion.
  • Ensured compliance with enterprise security standards by using encryption, data masking, and access control policies applied across all RAG pipeline layers.
  • Led large-scale data platform migration of multiple Fortune 500 organizations from Snowflake to Databricks Lakehouse architecture, improving performance and lowering platform costs by 30–50%.
  • - Data Platform Re-architecture: Migrated large-scale data workloads from Snowflake to Databricks using Delta Lake and Apache Spark, achieving 40% faster pipeline execution.
  • - MLOps Automation: Built CI/CD-integrated ML pipelines using MLflow, Databricks Workflows, and GitHub Actions to support automated model retraining and deployment.
  • - Mentorship & Delivery Excellence: Mentored junior engineers and contributed to company-wide best practices in Databricks, Spark, and MLOps. Played a key role in the organization earning 4× 'Elite Databricks Partner' status.
  • Re-architected ETL pipelines using Apache Spark and Delta Lake in order to achieve high-throughput, ACID-compliant data processing.
  • Created scalable, tunable data workflows using Databricks Workflows, Airflow, and DBT, reducing pipeline execution time by an average of 40%.
  • Migrated Snowflake schemas, stored procedures, and user access controls to Unity Catalog to maintain data governance and lineage.
  • Optimized and benchmarked query performance post-migration, achieving significant acceleration in data reading and analytics workloads.
  • Collaborated with cross-functional teams (data engineering, analytics, security) to enable smooth cutover and post-migration validation.
  • Guided junior engineers and cross-functional teams for Databricks best practices, Spark optimization, and MLOps workflows and accelerated onboarding and project ramp-up by ~50%.

Data Analyst

Tech Mahindra
01.2017 - 07.2017
  • Completed code review and hands-on training sessions to improve team capabilities in data engineering, machine learning, and cloud architecture.
  • Acted as the technical liaison between the client and internal teams to ensure coordination with Databricks solution architectures and implementation standards.
  • Directly contributed to the company's 4× 'Elite Databricks Partner' status through the promotion of delivery excellence on many high-impact projects.
  • Created reusable templates, CI/CD workflows, and knowledge artifacts to institutionalize quality and improve time-to-delivery for new projects.

Education

Master of Science - Cybersecurity

University of North Texas
Denton, TX

Skills

  • OpenAI (GPT-4)
  • LangChain
  • Retrieval-Augmented Generation (RAG)
  • Prompt Engineering
  • Embedding Models
  • Supervised Learning
  • Unsupervised Learning
  • Feature Engineering
  • Model Evaluation
  • Hyperparameter Tuning
  • Time Series Forecasting
  • Scikit-learn
  • XGBoost
  • LightGBM
  • MLflow
  • Databricks Workflows
  • CI/CD (GitHub Actions)
  • Model Registry
  • Docker
  • Streamlit
  • RESTful APIs
  • Apache Spark
  • Delta Lake
  • PySpark
  • Unity Catalog
  • ETL/ELT Development
  • DBT
  • Data Quality & Validation
  • Pinecone
  • FAISS
  • Chroma
  • ANN Indexing
  • Semantic Search
  • Pandas
  • NumPy
  • SciPy
  • Matplotlib
  • Seaborn
  • Plotly
  • Experiment Design
  • A/B Testing
  • Databricks (AWS/Azure)
  • Snowflake
  • Azure Data Lake
  • Amazon S3
  • SQL
  • Python
  • Git
  • Jupyter Notebooks
  • VS Code
  • Postman
  • Bash
  • Cross-functional Team Leadership
  • Client Engagement
  • Technical Mentorship
  • Agile/Scrum
  • Documentation & Reporting

Timeline

Data Scientist

State Farm
01.2020 - Current

Data Scientist

Wells Fargo
07.2017 - 12.2019

Data Analyst

Tech Mahindra
01.2017 - 07.2017

Master of Science - Cybersecurity

University of North Texas
Jyothika Vollireddy