Passionate AI/ML Engineer with 10 years of IT experience, including 7+ years in AIML Data Engineering and ETL Development, specializing in Data Integration and 2 years in Python development. Skilled in designing and deploying scalable data pipelines and infrastructure, with a solid foundation in data modeling, ETL processes, and big data technologies. Proficient in building scalable AI pipelines leveraging Retrieval-Augmented Generation (RAG), Transformer-based LLMs (e.g., GPT-4, LLaMA, Claude), and advanced NLP techniques for real-time document intelligence, anomaly detection, and event extraction. Familiar with implementing LangChain and LangGraph for agentic workflows and integrating multi-hop, hybrid, and semantic search in enterprise GenAI solutions Integrated Databricks with cloud services like AWS and Azure for seamless data processing and storage. Experience in Data Architecture, Design Data Ingestion, Design Data Pipelines, Design the Domain Model Experience in SPARK - Spark Streaming, Spark SQL, Spark Data Frames, Spark Performance Tuning Hands-on Experience in PYSPARK, PYTHON, SQL. Developed and maintained data pipelines using PySpark and Databricks, ensuring efficient processing of large datasets. Designed ETL workflows on Databricks to ingest, clean, and transform data for analytics and reporting purposes. Optimized PySpark code to improve performance and scalability in distributed computing environments. Collaborated with data scientists to deploy real-time ML models on Databricks, utilizing PySpark for streaming data processing and prediction. Good Understanding of CICD Tools Jenkins, Git, Bid Bucket for deployment of code to various Environments Experience in agile SCRUM methodology, sprint planning, creating backlogs/stories, sprint reviews, delivery Demonstrated ability to optimize PySpark code for performance, leading to significant improvements in execution speed and resource utilization. Hands-on experience with version control systems like Git and collaborative development practices using GitHub/Bitbucket Good Knowledge of Machine Learning, Deep Learning, LLMs, RAG, natural language processing (NLP) techniques for text analysis, sentiment analysis, and chatbot development in Artificial Intelligence AI. Analyze data to identify patterns, trends, and insights that can drive decision-making and improve AI models. Collaborated closely with data engineering, analytics, and operations teams to deliver scalable, efficient, and high-performance data pipelines
LLM-Powered Document, Built a GenAI-based QA system using LangChain, GPT-4, and Azure Blob Storage to extract insights from insurance documents and claim files. Time Series Anomaly Detection, Leveraged transformer-based models like Patch-TST and TiDE for predicting claim anomalies and fraudulent events. LLM Evaluation Toolkit, Created a custom script using BLEU/ROUGE and BERTScore to benchmark summarization accuracy across insurance domain LLM outputs.
GPT-4, GPT-3.5, LLaMA, Claude, OpenAI/Azure OpenAI, HuggingFace Transformers, LangChain, LangGraph, Vector Search (FAISS, Elastic, Pinecone), Hybrid Search, BM25, MLFlow, SageMaker, Azure ML, GitHub Actions, Docker, FastAPI, Streamlit