AI engineer with expertise in machine learning (ML), large language models (LLMs), and high-performance AI systems. Skilled in designing, fine-tuning, and optimizing neural networks for real-world applications, including multimodal AI, NLP, computer vision, and predictive analytics. Experienced in scaling AI models, LLM inference optimization, and implementing high-throughput AI pipelines. Proficient in Docker, Kubernetes, and distributed computing, ensuring efficient deployment of AI models. Passionate about transforming complex AI research into production-ready solutions. Innovative Trainee Software Engineer, known for high productivity and efficient task completion. Possess specialized skills in Java programming, software debugging, and agile development methodologies. Excel in teamwork, problem-solving, and adaptability, ensuring seamless collaboration, and innovative solutions to complex challenges.
- Developed a custom multimodal Visual Question Answering (VQA) model by merging BLIP and CLIP, achieving superior performance metrics (ROUGE, COSINE, BLEU) on an educational dataset with 8,000+ annotated question-answer pairs.
- Developed a real-time American Sign Language (ASL) translation system using Transformer-based models, MediaPipe for hand landmark data, and GPT-2 for sentence correction, enabling accurate and grammatically coherent gesture-to-text translation.
- Designed and implemented a U-Net-based model for lung segmentation in medical imaging, achieving a Dice coefficient of 0.9394 and IoU of 0.8904, with enhanced accuracy through a novel post-processing layer.
- Designed and implemented a deep learning pipeline with GRU networks, achieving robust performance in smart home and healthcare applications. Optimized real-time inference for Wi-Fi-based motion detection, demonstrating scalability and real-world feasibility.
-Conducted a comparative study of modern image captioning architectures, including CNN-LSTM, BERT-Transformer, and TransCapNet. Evaluated models using BLEU and ROUGE-L, F scores, highlighting performance trade-offs in vision-language tasks.