Machine Learning Engineer and Data Scientist with 3+ years of experience building production quality machine learning and Natural Language Processing Solutions across Finance,Insurance and healthcare.Proven expertise in developing scalable ml models,knowledge graph driven systems and microservices based architectures integrated into real-world business flows.Skilled in stastical analysis,probabilty theory,hypothesis-testing and A/B testing to validate model performance and strengthen data driven decision making.Hands on experience deploying secure, high performance ML solutions using AWS,SageMaker,Docker,Kubernetes,jenkins,CI/CD and containerized microservices.Strong background in Fraud Detection,Semantic Understanding,Structured and Unstructured Data Modeling and ml research.Adept at collaborating with product managers,partner teams and vendors to translate business -critical problems into scalable ml frameworks and reusable components.
Overview
6
6
years of professional experience
Work History
Project Title: Chatbot Implementation with GPT-4 Generative AI Model and Database Integration
Gen AI/ Berkshire Hathaway- TX
12.2024 - Current
Designed and developed an innovative chatbot using Agentic AI, named the Insurance Assistance Virtual Advisor, tailored for policyholders and agents seeking insurance-related support. Leveraging the state-of-the-art OpenAI GPT-4 model integrating semantic Understanding,knowledge-graph from Azure OpenAI Service, the chatbot efficiently extracts user intent and entities from natural language queries, facilitating seamless interaction between insurance representatives and customers.
Key Contributions and Achievements: Generative AI Implementation: Spearheaded the deployment of the generative AI features of OpenAI GPT-4, which included obtaining policy information, entities connected to claims, and client needs from user inquiries. Policy search, premium computations, and claims status retrieval are more efficient by the automated dynamic creation of SQL queries and JSON request bodies. Architected a multi-agent workflow with intent detection,entity extraction, SQL generation grounding, and business-rule validation reducing annual effort for support representative
Conversational Flow Design: In order to provide precise and customer-focused help, GPT-4 was designed to intelligently produce three followup questions that could spark interesting discussions and provide clarification on coverage, premium, or claim information.Implemented LLM safety, monitoring, and session management to ensure reliable deployment at scale.
Intelligent Response Generation: Prompt engineering and function calling approaches were used to design chatbot responses using GPT4 . This allowed the system to offer succinct definitions of policy terminology, customized claim updates, and instructions for the next steps.Built production-quality code, scalable APIs, and microservices architecture using FastAPI with asynchronous processing.
Database Integration: In order to give real-time information on insurance policies, premium breakdowns, claims, and customer profiles, PostgreSQL and Cornerstone structured databases were successfully integrated. This allowed for quicker and more transparent service.Designed automated pipelines for SQL grounding,JSON schema validation,entity extraction and follow up question generation
Session Management: In order to provide context retention and a smooth transition between clients and human agents when necessary, a strong session management system was put in place using Couchbase DB for discussion history storage and Redis DB for GPT-4 sessions.
Cloud Deployment:Implemented secure and scalable deployment using AWS cloud services,Docker,containerization and session orchestration with Redis and Couchbase.
Technology Stack Optimization: Implemented the chatbot service using FastAPI and employed asynchronous programming techniques for enhanced scalability and responsiveness, contributing to an efficient and high-performance solution.This project showcased my leadership in implementing advanced AI/ML technologies, optimizing database integration, and enhancing the overall user experience through intelligent response generation, entity extraction, summarization, and the generation of follow-up questions. The successful deployment of this solution demonstrates my expertise in crafting innovative AI-driven systems for enhanced customer satisfaction and efficient service delivery.
Data Scientist
Cybage Software
01.2022 - 07.2023
Leveraged state-of-the-art Predictive Machine Learning techniques to identify and classify Fraud detection for a fintech client. Developed predictive models, and enhanced operational efficiency by leveraging large-scale data pipelines, predictive modeling, and cloud-native deployment to deliver high-performing, production-ready systems that reduced risks, improved accuracy, and streamlined decision-making across financial institutions.
Key Contributions and Achievements: Client Engagement: Collaborated closely with clients to gain a comprehensive understanding of their diverse databases and the information stored within them, fostering effective project communication and alignment with customer needs.
Data Transformation and Analysis: Led the collection, transformation, and cleansing of structured data from multiple sources using Python programming. Provided valuable insights and qualitative data analysis, leveraging visualization libraries such as Matplotlib and Seaborn to enhance data understanding.Built executive dashboards in powerbi to visualize fraud risk trends,model drift, and operational KPI'S enabling business teams to reduce review time by 35%
Data Preprocessing: Prepared the text data for machine learning model training with rigorous preprocessing techniques, including outlier detection, data imbalance handling, feature selection, and feature engineering. This ensured the highest data quality for model development.Applied hypothesis testing,A/B testing,and model validation for robust predictive performances.
NLP Pipeline Development: Designed an NLP pipeline using spaCy and Transformers to extract key entities from legal contracts, reducing manual review effort by 40 hours per month while achieving 92% extraction accuracy.
Machine Learning Models: Developed two powerful machine learning models, XGBoost and Multinomial Naive Bayes (MNB), using text analytics to address a multi-class classification problem. Combined machine learning predictions with regular expressions to enhance overall solution accuracy.
Big Data Processing: Migrated batch ETL processes to PySpark on Hadoop to process 10TB+ of financial data weekly, reducing computation time by 60% and enabling near real-time credit risk evaluation.
Cloud Deployment: Deployed machine learning models using Docker and CI/CD pipelines on Azure Kubernetes Service (AKS), reducing model downtime by 85% and enabling bi-weekly updates for real-time decisioning.
Data Engineering: Conducted extensive data cleaning and feature engineering across 50+ client datasets, improving model performance metrics by 25% and enhancing pipeline reliability by implementing error handling and logging in dbt and Airflow.
Usage Statistic Reporting: Generated and reported monthly usage statistics from Splunk logs, providing valuable insights into the system's performance and usage patterns.This project exemplified my expertise in ML/AI technologies, data preprocessing, model development, and deployment on Kubernetes. Additionally, it showcased my ability to work closely with clients to address data privacy concerns and ensure GDPR compliance through proactive PII detection and data purging.
Cross Functional Collaboration:Collaborated with product managers,data engineers and compliance teams to ensure GDPR-safe workflows,aligning AI outputs with business rules.
Data Scientist
Mphasis Inc
09.2019 - 12.2021
Predicting readmission of patients in a hospital and providing an intervention mechanism using an analytical platform for real-time data. This framework helps in reducing readmission penalties by the government on the healthcare provider.
Key Contributions and Achievements: Analyzed, Explored and prepared data: Conducted extensive Exploratory Data Analysis(EDA),performing rigorous outlier treatment and imputation of missing values to ensure data quality and distribution integrity for predictive modeling.
Predictive Model Development & Optimization :Implemented and fine-tuned various machine learning models, including Logistic Regression and Ensemble Methods (like Random Forest and Gradient Boosting), to accurately predict patient readmission risk
Model Validation and Selection :Validated the performance of different models using statistical metrics such as ROC-AUC, Precision/Recall, and F1-score , ultimately selecting Random Forest as the final production model due to superior predictive power and stability.
Actionable Insights and Explainability :Enhanced model transparency by creating visual explanations using Decision Tree plots and feature importance analysis to clearly communicate key readmission drivers and risk factors to clinical leadership teams. Translated analytical findings into business recomendations,influencing policy rules changes the reduced operational cost by 18%
Designed and Guided Intervention Strategies :Leveraged model insights to design and statistically guide an A/B testing framework for hospital interventions, measuring the impact of new post-discharge protocols on 30-day readmission rates.
Anomaly Detection and Data Integrity :Integrated an unsupervised anomaly detection system ( using Isolation Forest) into the data pipeline to flag unusual patient records or data entry errors that could bias the model, ensuring the robustness of the training data.
Continuous Monitoring and MLOps: Established a framework for real-time model monitoring to detect and alert on model drift and data shift, ensuring sustained predictive accuracy in a dynamic healthcare environment.
Business Impact and Alignment: Designed and translated model predictions and penalties by the government policies into actionable interventions, directly supporting the hospital's goal of reducing government penalty costs associated with high readmission rates.Actively participated in sprint planning and backlog prioritization to ensure ML initiatives aligned with business goals.
VP of Sales and Business Development at Berkshire Hathaway HomeServices PenFed Realty TXVP of Sales and Business Development at Berkshire Hathaway HomeServices PenFed Realty TX
Material Specialist/Operations Support at CPI communications & power IndustriesMaterial Specialist/Operations Support at CPI communications & power Industries