Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Vineesh Kumar D

Newark,DE

Summary

Results-driven AI/ML and Data Engineer with over 7 years of experience in architecting and scaling Generative AI applications and cloud data platforms. Expertise encompasses all phases of the machine learning lifecycle, including data preprocessing, feature engineering, and MLOps, utilizing advanced tools such as TensorFlow and PyTorch. Demonstrated success in delivering innovative GenAI solutions with LangChain and OpenAI APIs while ensuring robust data governance and compliance. Committed to fostering collaboration across teams to drive impactful results and enhance organizational capabilities.

Overview

7
7
years of professional experience

Work History

AI/ML/Data Engineer

M&T Bank
Buffalo, NY
01.2025 - Current
  • Architected end-to-end RAG pipelines leveraging OpenAI embeddings, FAISS vector databases, and LlamaIndex for semantic document retrieval with embedding rerankers, enabling enterprise knowledge management with sub-second query latency.
  • Engineered production-grade GenAI solutions using LangChain orchestration, OpenAI GPT APIs, Hugging Face Transformers, and Anthropic Claude for multi-step reasoning workflows and dynamic workflow automation in internal decision support systems.
  • Designed and implemented advanced prompt engineering workflows for GenAI models, leveraging OpenAI Function Calling and Tool Use APIs to enable intelligent multi-step reasoning and business logic automation.
  • Implemented LLMOps safety frameworks (Guardrails.ai, TruLens) to prevent hallucinations, enforce structured outputs, and ensure responsible AI governance in customer-facing applications, achieving 99.2% response quality compliance.
  • Optimized PySpark transformations in Azure Databricks with Delta Lake ACID transactions and comprehensive CDC implementation, reducing data change capture latency by 40% while maintaining 99.8% consistency guarantees.
  • Engineered dimensional analytical datasets in Azure Synapse Analytics using star schema optimization, supporting complex BI queries and predictive churn modeling with sub-second response times.
  • Configured OpenShift GPU Operator to orchestrate distributed deep learning workloads across TensorFlow and PyTorch, scaling model training across 8+ GPU-backed nodes to achieve high resource utilization and fault tolerance.
  • Designed multi-source data ingestion pipelines using Azure Data Factory and Databricks, orchestrating telemetry collection, log processing, and real-time monitoring data consolidation from enterprise systems.
  • Tuned Snowflake data warehouse performance through partition pruning, dynamic clustering key strategies, and query profiling optimization, accelerating analytical dashboard load times by 55%.
  • Developed Spark applications using Spark SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats, revealing customer usage insights and operational patterns.
  • Executed data ingestion to Azure platform services (Data Lake, Storage, SQL Database, Data Warehouse) with comprehensive validation, processing and analyzing data at scale in Databricks.
  • Developed Apache Airflow DAGs for orchestrating complex scheduling and monitoring of interdependent ETL processes, implementing dynamic workflows, retry logic, and alerting to achieve 99.8% SLA compliance.

Azure Data Engineer

Wintrust Financial
Rosemont, IL
12.2022 - 12.2024
  • Led enterprise ETL transformation processing 20M+ pharmacy transactions daily via Azure Data Factory, Databricks, and Delta Lake, optimizing SLA adherence by 35% and eliminating operational bottlenecks in critical healthcare workflows.
  • Migrated legacy SQL Server reporting infrastructure to Snowflake and Azure Synapse Analytics, achieving 65% query runtime reduction and generating $250K+ in annual infrastructure cost savings through dimensional modeling and partition optimization.
  • Developed Apache Spark jobs in Scala with custom input format handlers for non-standard file formats, leveraging Spark SQL and Spark Streaming for rapid data processing and real-time transformations.
  • Engineered dimensional data warehouses using Kimball methodology with star and snowflake schemas in Azure Synapse, supporting complex BI queries, churn prediction analytics, and executive dashboards with optimized query performance.
  • Implemented slowly-changing dimension (SCD) transformations and data mart architectures to maintain historical data integrity in the data warehouse, supporting comprehensive business intelligence analysis.
  • Configured Azure cloud services for production endpoint deployment, implementing lift-and-shift migrations and cloud-native strategies across Azure SQL Database, Data Lake, Factory, Synapse, Service Bus, Key Vault, Analysis Services, and Blob Storage.
  • Engineered dimensional data warehouses using Kimball methodology with star and snowflake schemas in Azure Synapse, enabling complex BI queries, churn prediction analytics, and executive dashboards with optimized query performance.
  • Developed Apache Spark jobs in Scala with custom input format handlers for processing non-standard file formats, coupled with Spark SQL and Spark Streaming for rapid data processing and real-time transformations.
  • Developed Hive queries and custom UDFs for comprehensive data analysis and transformation within HDFS, creating data marts that supported dimensional modeling and historical data management.
  • Engineered multiple MapReduce jobs in Java for comprehensive data cleaning and preprocessing, establishing distributed data processing pipelines for large-scale analytics workloads.
  • Executed version control using Git for code management and Jira for comprehensive issue tracking and project management, maintaining code quality through rigorous peer reviews and continuous integration practices.
  • Executed version control using Git for code management and Jira for issue tracking and project management, ensuring code quality through peer reviews and continuous integration.
  • Constructed REST APIs in Scala implementing robust data transformation logic, enabling downstream system integration with fault-tolerant request handling, comprehensive error management, and performance monitoring.

Data Scientist

Techasoft Pvt. Ltd.
India
05.2019 - 07.2022
  • Developed Apache Spark jobs delivering 3x performance improvements over standard MapReduce, leveraging Scala and Spark SQL for rapid data processing and optimized query execution.
  • Designed healthcare data warehousing solutions with comprehensive data cleansing, surrogate key generation, SCD Type 2 implementations, and CDC mechanisms within Snowflake supporting clinical and financial analytics.
  • Orchestrated end-to-end ETL pipelines with Apache Airflow, automating 50+ production DAGs with dynamic scheduling, comprehensive monitoring, automatic retry logic, and alerting for reliable distributed workload execution.
  • Orchestrated end-to-end ETL pipelines with Apache Airflow, automating 50+ production DAGs to ensure reliable distributed workload execution through dynamic scheduling, monitoring, automatic retry logic, and alerting.
  • Developed dynamic tables in Snowflake replacing traditional stored procedures, simplifying transformation logic, improving data load performance, and enhancing pipeline efficiency.
  • Developed and configured Apache Kafka infrastructure for real-time server log aggregation, managing broker configurations, consumer groups, and partition strategies to enable processing of 100K+ events per second.
  • Created AWS S3 data lakes with enterprise-grade encryption and fine-grained access control policies, supporting petabyte-scale analytics while ensuring secure cross-organizational data sharing and compliance.
  • Developed and configured Apache Kafka infrastructure for real-time server log aggregation, managed broker configurations, consumer groups, and partition strategies enabling 100K+ events-per-second processing.
  • Implemented Kafka pub-sub messaging systems for application and system log ingestion, creating topics, managing producers and consumers, enabling downstream analytics systems to process real-time event streams.
  • Configured Apache Airflow for S3 bucket and Snowflake data warehouse orchestration, created and executed production DAGs with monitoring, alerting, and comprehensive error handling.
  • Utilized AWS services including data architecture, analytics, and enterprise data warehouse solutions to ensure optimal scalability, flexibility, availability, and performance across data platforms.
  • Executed Agile project management using SCRUM model, configuring sprints and leading planning and retrospectives to deliver projects on schedule through cross-functional team collaboration.

Education

Master of Science - Information Systems Technology

Wilmington University
New Castle, DE
01-2024

Bachelor of Science - Electronics and Computer Science

Anurag Group of Institutions
Hyderabad India
01-2020

Skills

  • AI & Generative AI
  • Generative AI
  • LLM Frameworks & Libraries
  • LangChain LangGraph LlamaIndex CrewAI Hugging Face Transformers Autogen SpaCy NLTK Rasa Dialogflow Streamlit OpenAI APIs Azure OpenAI API Anthropic Claude Google Gemini Mistral LLaMA 3 BERT GPT-4 GPT-35
  • LLMOps & Safety
  • Hallucination detection
  • Machine Learning & Deep Learning
  • TensorFlow Keras PyTorch Scikit-learn XGBoost Neural Networks Transformers CNNs RNNs LSTMs Deep Learning NLP Computer Vision Feature Engineering Model Evaluation
  • Data Integration
  • Data Pipeline Development
  • Data Platforms & Cloud Services
  • Snowflake Azure Databricks Delta Lake Apache Hive Presto AWS Glue
  • Snowflake PostgreSQL MySQL Oracle SQL Server MongoDB DynamoDB Cassandra HBase Redshift BigQuery Star Schema Snowflake Schema Dimensional Modeling Kimball Modeling SCD Type 2 Change Data Capture (CDC) Data Marts
  • Cloud services
  • Microsoft Azure
  • DevOps & Containerization
  • Docker Kubernetes OpenShift GPU Operator MLflow DVC GitHub Actions Jenkins CI/CD Pipelines Infrastructure as Code Terraform CloudFormation Git Jira Monitoring & Observability
  • Stream Processing
  • Apache Kafka Kafka Brokers Consumer Groups Partition Management Real-time Analytics Event Processing Pub-Sub Messaging Asynchronous Workflows
  • APIs & Frameworks
  • FastAPI Flask Django REST APIs GraphQL OpenAI Function Calling Postman Nodejs Scala
  • Programming Languages
  • Python Scala SQL R Bash JavaScript TypeScript Java
  • Data Visualization
  • Data Analysis
  • Pandas NumPy Statistical Analysis Exploratory Data Analysis Predictive Modeling Churn Prediction Demand Forecasting Data Migration Data Profiling Data Cleansing
  • HiveQL Pig U-SQL Azure Data Lake Analytics Alteryx Data Management
  • Agile development

Timeline

AI/ML/Data Engineer

M&T Bank
01.2025 - Current

Azure Data Engineer

Wintrust Financial
12.2022 - 12.2024

Data Scientist

Techasoft Pvt. Ltd.
05.2019 - 07.2022

Master of Science - Information Systems Technology

Wilmington University

Bachelor of Science - Electronics and Computer Science

Anurag Group of Institutions