Summary
Overview
Work History
Education
Skills
Certification
Websites
Projects
Websites Portfolios Profiles
Technical Skills
Timeline
Generic

Sindhuja Kancharla

Summary

Passionate AI/ML Engineer with 10 years of IT experience, including 7+ years in AIML Data Engineering and ETL Development, specializing in Data Integration and 2 years in Python development. Skilled in designing and deploying scalable data pipelines and infrastructure, with a solid foundation in data modeling, ETL processes, and big data technologies. Proficient in building scalable AI pipelines leveraging Retrieval-Augmented Generation (RAG), Transformer-based LLMs (e.g., GPT-4, LLaMA, Claude), and advanced NLP techniques for real-time document intelligence, anomaly detection, and event extraction. Familiar with implementing LangChain and LangGraph for agentic workflows and integrating multi-hop, hybrid, and semantic search in enterprise GenAI solutions Integrated Databricks with cloud services like AWS and Azure for seamless data processing and storage. Experience in Data Architecture, Design Data Ingestion, Design Data Pipelines, Design the Domain Model Experience in SPARK - Spark Streaming, Spark SQL, Spark Data Frames, Spark Performance Tuning Hands-on Experience in PYSPARK, PYTHON, SQL. Developed and maintained data pipelines using PySpark and Databricks, ensuring efficient processing of large datasets. Designed ETL workflows on Databricks to ingest, clean, and transform data for analytics and reporting purposes. Optimized PySpark code to improve performance and scalability in distributed computing environments. Collaborated with data scientists to deploy real-time ML models on Databricks, utilizing PySpark for streaming data processing and prediction. Good Understanding of CICD Tools Jenkins, Git, Bid Bucket for deployment of code to various Environments Experience in agile SCRUM methodology, sprint planning, creating backlogs/stories, sprint reviews, delivery Demonstrated ability to optimize PySpark code for performance, leading to significant improvements in execution speed and resource utilization. Hands-on experience with version control systems like Git and collaborative development practices using GitHub/Bitbucket Good Knowledge of Machine Learning, Deep Learning, LLMs, RAG, natural language processing (NLP) techniques for text analysis, sentiment analysis, and chatbot development in Artificial Intelligence AI. Analyze data to identify patterns, trends, and insights that can drive decision-making and improve AI models. Collaborated closely with data engineering, analytics, and operations teams to deliver scalable, efficient, and high-performance data pipelines

Overview

9
9
years of professional experience
1
1
Certification

Work History

Sr. Data | AI ML Engineer

National-Life Group
California
06.2023 - Current
  • Built modular pipelines for deploying Generative AI solutions using LangChain and Azure OpenAI.
  • Architected a RAG pipeline for internal knowledge retrieval across policyholder documents using hybrid search, semantic chunking, and fine-tuned embeddings.
  • Implemented cloud-native architectures leveraging Azure Functions, Azure ML Studio, and event hubs, ensuring highly available and fault-tolerant data pipelines.
  • Integrated Human-in-the-Loop (HITL) review mechanism using Azure Functions and Python APIs to validate LLM outputs in production workflows.
  • Fine-tuned LLMs using LoRA and PEFT for insurance policy summarization and anomaly detection in real-time event logs, improving document QA accuracy by 20%.
  • Developed an evaluation framework for RAG pipelines using BERTScore and BLEU for claim summarization tasks.
  • Leveraged Azure ML Studio and Databricks for orchestrating the fine-tuning and deployment of Transformer-based models.
  • Architected and designed end-to-end data solutions using Azure Databricks, Azure Data Factory, Azure SQL DB, and Azure Cosmos DB.
  • Designed metadata-driven frameworks, developed and maintained metadata-driven data ingestion frameworks to improve automation, scalability, and data pipeline management.
  • Work on the implementation of building data pipelines, implementing the production-ready, data-driven solution, as well as supporting the deployment.
  • Worked on developing data ingestion pipelines using Spark Streaming, Spark SQL, Spark DataFrames using PySpark in Azure Databricks, Delta Live Tables (DLT), and Python for data extraction, transformation, and aggregation from multiple source systems for business use cases for real-time ingestion and batch processing.
  • Architected and deployed scalable AI/ML pipelines on Azure Databricks, Azure Data Factory, Azure SQL DB, and Azure Cosmos DB, enabling seamless integration of structured and unstructured data sources.
  • Analyze different files from different sources in ADLS, and load data into source mirror and staging tables through Spark Streaming and batch processing.
  • Designed and implemented data pipelines using Azure Data Factory to automate data movement and transformation processes.
  • Integrated Human-in-the-Loop (HITL) validation workflows to enforce responsible AI, and minimize security risks in LLM deployments.
  • Integrated privacy-by-design principles into AI/ML workflows, including HITL checkpoints, restricted access controls, and data retention policies.
  • Worked on developing a unit test framework in Python and Spark using Pytest packages in Azure Databricks, and promoted it through Jenkins pipelines via CI/CD.
  • Implemented distributed feature engineering workflows in PySpark on Databricks to handle high-dimensional and large-scale datasets efficiently.
  • Designed and deployed a Retrieval-Augmented Generation (RAG) pipeline using LangChain, Azure OpenAI, and FAISS, enabling accurate question-answering over insurance claim documents and reports.
  • Implemented agentic RAG workflows with LangGraph, orchestrating multi-step retrieval and reasoning logic across structured (SQL) and unstructured (PDF, HTML, JSON) data sources for scalable decision intelligence.
  • Fine-tuned LLMs using PEFT methods (LoRA, QLoRA) for custom insurance-specific tasks, such as claim summarization and fraud pattern detection, and benchmarked output quality using BLEU, ROUGE, and BERTScore.
  • Ensured compliance with GDPR, HIPAA, and internal audit policies by implementing data anonymization, masking, and tokenization techniques across data pipelines.
  • Collaborated with compliance teams to establish audit trails and lineage tracking in Azure Databricks/ADF for regulatory reporting.
  • Built and deployed predictive models using PySpark and Databricks, integrating TensorFlow and Scikit-Learn for advanced machine learning tasks.
  • Leveraged Databricks for hyperparameter tuning and model optimization at scale, reducing training time, and improving model performance.
  • Designed ETL pipelines on Databricks to preprocess data for AI/ML applications, ensuring high-quality input for training and inference.
  • Contributed innovative ideas and solutions to enhance team performance and outcomes.
  • Worked successfully with diverse group of coworkers to accomplish goals and address issues related to our products and services.
  • Promoted high customer satisfaction by resolving problems with knowledgeable and friendly service.

Sr. Data | AI ML Engineer

Value Momentum
Hyderabad
04.2016 - 04.2023
  • Designed and implemented robust data integration pipelines using Azure Databricks and PySpark, enabling ingestion of structured and semi-structured data from multiple on-prem and cloud sources.
  • Developed modular ETL frameworks using Delta Lake and PySpark, incorporating error handling, logging, and retry mechanisms to ensure data pipeline resilience and reusability.
  • Built and maintained real-time streaming pipelines using Spark Structured Streaming and Delta Live Tables (DLT) for processing events from Azure Event Hubs and other sources.
  • Worked with Azure Data Factory to orchestrate complex workflows and dependencies across batch and streaming jobs, automating data movement and transformation.
  • Developed scripts to extract, transform, and load (ETL) insurance data from internal systems to centralized data lakes.
  • Automated monthly policyholder communication (renewals, notices, etc.) using Python scripts integrated with email services.
  • Created dashboards and ad-hoc reports using Python, SQL, and Pandas to support actuarial and compliance teams.
  • Built and maintained RESTful APIs to integrate with third-party insurance platforms (e.g., payment gateways).
  • Designed data models in Azure SQL DB and Hive based on dimensional modeling principles, supporting analytical workloads with star schemas and optimized indexes.
  • Integrated Azure Cosmos DB into the pipeline to support low-latency operational analytics on semi-structured JSON datasets.
  • Engineered solutions to process and cleanse raw data stored in Azure Data Lake Storage (ADLS) and stage it into curated layers using PySpark transformations.
  • Developed scalable backend applications using Python with Flask/Django, supporting policy management and claims workflows.
  • Improved performance of PySpark jobs by implementing partitioning, broadcast joins, and fine-tuning cluster configurations on Azure Databricks.
  • Automated deployment of data pipelines using GitHub Actions, integrated with Databricks Repos for CI/CD and version control across environments.

Education

Bachelor of Technology - Computer Science

Jawaharlal Nehru Technological University
India
01.2016

Skills

  • GPT-4
  • GPT-35
  • LLaMA
  • Claude
  • OpenAI/Azure OpenAI
  • HuggingFace Transformers
  • LangChain
  • LangGraph
  • Vector Search
  • FAISS
  • Elastic
  • Pinecone
  • Hybrid Search
  • BM25
  • MLFlow
  • SageMaker
  • Azure ML
  • GitHub Actions
  • Docker
  • FastAPI
  • Streamlit
  • LoRA
  • QLoRA
  • PEFT
  • BERTScore
  • BLEU
  • ROUGE
  • Unstructuredio
  • LlamaParse
  • Pdfplumber
  • Tesseract
  • Camelot
  • Prometheus
  • Grafana
  • Evidentlyai
  • AWS CloudWatch

Certification

  • Microsoft Certified: Azure AI Fundamentals, 12/01/18
  • AWS Certified Machine Learning Engineer – Associate, 12/01/23
  • Python Developer Certification, 12/01/21
  • Python for Data Science and Machine Learning Bootcamp (Deep Mind Systems), Not Provided
  • Microsoft Certified Fabric data engineer, 12/01/25

Projects

LLM-Powered Document, Built a GenAI-based QA system using LangChain, GPT-4, and Azure Blob Storage to extract insights from insurance documents and claim files. Time Series Anomaly Detection, Leveraged transformer-based models like Patch-TST and TiDE for predicting claim anomalies and fraudulent events. LLM Evaluation Toolkit, Created a custom script using BLEU/ROUGE and BERTScore to benchmark summarization accuracy across insurance domain LLM outputs.

Websites Portfolios Profiles

  • LinkedIn, http://www.linkedin.com/in/sindhuja-kancharla-41633b329/
  • GitHub, https://github.com/SindhujaKancharla74

Technical Skills

GPT-4, GPT-3.5, LLaMA, Claude, OpenAI/Azure OpenAI, HuggingFace Transformers, LangChain, LangGraph, Vector Search (FAISS, Elastic, Pinecone), Hybrid Search, BM25, MLFlow, SageMaker, Azure ML, GitHub Actions, Docker, FastAPI, Streamlit

Timeline

Sr. Data | AI ML Engineer

National-Life Group
06.2023 - Current

Sr. Data | AI ML Engineer

Value Momentum
04.2016 - 04.2023

Bachelor of Technology - Computer Science

Jawaharlal Nehru Technological University
Sindhuja Kancharla