Summary
Overview
Work History
Education
Skills
Certification
Projects And Publications
Timeline
Generic

Sai Tejasri Yerramsetti

Summary

Data Scientist with a proven track record at SDSU Research Foundation, excelling in genomic ETL pipeline development and interactive dashboard creation. Expert in Python and Tableau, I enhance data-driven decision-making while fostering collaboration across teams. Achieved a 40% reduction in processing time, showcasing my commitment to impactful results.

Overview

4
4
years of professional experience
1
1
Certification

Work History

RESEARCH DATA SCIENTIST

SDSU Research Foundation
07.2024 - Current
  • Built genomic ETL pipelines using PLINK2 and Google Cloud Platform (BigQuery, Storage), reducing processing time by 40% and accelerating cohort-wide analyses.
  • Developed interactive Tableau dashboards for SNP-association visualization; boosted clinical interpretability by 30% across interdisciplinary teams.
  • Partnered with statisticians and ML engineers to streamline phenotype-genotype matching workflows, enabling hypothesis testing at scale.

DATA SCIENTIST INTERN

ZIP Launchpad
01.2024 - 07.2024
  • Designed time series forecasting models (ARIMA, Prophet) in Python to predict event turnout; informed resource allocation with 87% accuracy.
  • Delivered Tableau/Power BI dashboards with role-based filters; improved strategic reporting velocity by 30% across product and marketing and conducted A/B testing survey analytics, increasing classification accuracy by 20%.

DATA SCIENTIST INTERN

Digital Innovation Lab
06.2023 - 08.2023
  • Created a GenAI-enabled chatbot using TogetherAI API and LangChain; scaled to 10K+ sessions with 40% user retention increase.
  • Integrated vector search and semantic search using ChromaDB and LangChain to improve retrieval precision in RAG workflows.
  • Fine-tuned Mistral 7B using PyTorch and RLHF techniques for sentiment inference; decreased latency by 20% in high-traffic pipelines.
  • Built a PySpark-based feedback analytics platform on AWS EC2 GPU; enabled real-time insight ingestion and reduced response lag by 25%.

AI SOFTWARE ENGINEER

Temenos
08.2021 - 08.2023
  • Built anomaly detection systems using XGBoost and LightGBM to monitor transaction fraud; improved accuracy by 22% and reduced manual review by 35%.
  • Created time series forecasting pipelines (LSTM, ARIMA, Prophet) in PySpark and SQL; reached 90% forecasting accuracy across 5 regions.
  • Automated MLOps life cycle using MLflow, Docker, and Jenkins; slashed model deployment time by 38%.
  • Streamed 100M+ sensor events daily via Spark + Kafka + Azure Synapse; decreased ETL latency by 6 hours.

Education

Master of Science - Big Data Analytics

San Diego State University
San Diego, USA
05.2025

Bachelor of Technology - Electronics and Communication Engineering

Sagi Ramakrishnam Raju Engineering College
India
07.2021

Skills

Programming Languages: Python, SQL, R, Java, C, Rust

Databases and Frameworks: MySQL, PostgreSQL, MongoDB, Flask, Snowflake, NoSQL, ChromaDB, Pinecone, SQLite, Kubernetes, FastAPI, Langgraph, LlamaIndex, CrewAI, SAS, Kafka, AirFlow, MLflow ML/DL: KNN, RNN, CNN, Transformers, BERT, GPT-4,LLaMa

Libraries: TensorFlow, PyTorch, Keras, Matplotlib, Seaborn, PySpark, Spacy, Numpy, Pandas, SciPy, OpenAI, NLTK, MLlib, OpenCV, BeautifulSoup, Scikit-Learn, Streamlit, Git, Bitbucket, Transformers, NLP

Technologies and Tools: Selenium, AWS (EC2, S3, RDS, Lambda, SageMaker), Azure (Data Factory, Data Lake, Databricks, Blob Storage), GCP (GCS, Dataflow, BigQuery, Vertex AI), GitHub, Apache Spark, Excel, Agile,

Tableau, Sigma, Looker, Power Bi, Microsoft Fabric

Certification

  • Oracle Cloud Infrastructure 2024 GenAI Professional
  • Databricks Academy Accreditation - GenAI Fundamentals
  • Google Data Analytics Professional Certificate (2025)

Projects And Publications

  • IEEE Paper - Human Emotion Classification using RNNs and KNN, IEEE Xplore, 04/01/22
  • Credit Card Fraud Detection, Journal of Emerging Technologies, 2021
  • Multi PDFs Chatbot AI Agent, Developed a Streamlit-based AI chatbot app integrating LangChain, FAISS, and Google Gemini Pro for real-time semantic QA across multiple PDFs using RAG and adaptive chunking.
  • AI-Powered Portfolio Generator, Developed an end-to-end application using Streamlit, Anthropic Claude API, and PDF/HTML processing to extract resume content and dynamically generate a professional portfolio website for users., [link]
  • Feedback Processing API, Built a Flask-based API for audio feedback processing, including transcription, summarization, sentiment analysis, categorization, and keyword extraction. Integrated Google Cloud Speech-to-Text and multiple NLP models for end-to-end text analysis., [link]

Timeline

RESEARCH DATA SCIENTIST

SDSU Research Foundation
07.2024 - Current

DATA SCIENTIST INTERN

ZIP Launchpad
01.2024 - 07.2024

DATA SCIENTIST INTERN

Digital Innovation Lab
06.2023 - 08.2023

AI SOFTWARE ENGINEER

Temenos
08.2021 - 08.2023

Master of Science - Big Data Analytics

San Diego State University

Bachelor of Technology - Electronics and Communication Engineering

Sagi Ramakrishnam Raju Engineering College
Sai Tejasri Yerramsetti