Summary
Overview
Work History
Education
Skills
LEADERSHIP & AWARDS
Languages
Timeline
Hi, I’m

Arul JC

Newyork,United States
Arul JC

Summary

GenAI Engineer and Data Scientist with a proven track record in designing, developing, and deploying end-to-end AI/ML solutions at scale. Deep expertise in Large Language Models including LLaMA 2, GPT-4, and Gemma. Skilled in model training, fine-tuning, and evaluation, with proficiency in deep learning frameworks like TensorFlow and PyTorch. Demonstrated success in building scalable machine learning pipelines, deploying production-grade AI systems, and leveraging cloud platform like AWS) for real-time AI applications.Seeking a challenging and rewarding role at a leading technology company where I can contribute to innovative AI products and advancements.

Overview

4
years of professional experience

Work History

Upbound INC

Machine Learning Engineer
07.2025 - Current

Job overview

  • Designed and automated a document classification and extraction pipeline for highly unstructured, multiregion documents using Tesseract OCR and rule-based methods to identify valid clauses.
  • Worked directly with customer and business teams to gather requirements, deliver AI solutions, and provide ongoing technical support.
  • Designed and implemented end-to-end ETL pipelines for highly unstructured, multi-region enterprise documents, enabling downstream Agentic AI and analytics workflows.
  • Built scalable data ingestion, transformation, and enrichment pipelines integrating OCR, structured extraction, and semantic embeddings.
  • Developed LLM-powered RAG and validation workflows, allowing AI agents to retrieve, reason over, and act on governed enterprise data.
  • Implemented API-first data services using FastAPI and AWS Lambda, enabling AI agents and applications to access structured and semantic data in real time.
  • Designed containerized model and data serving architectures using Docker and Kubernetes to support both batch and low-latency inference workloads.
  • Established data quality checks, validation rules, and observability pipelines using MLflow, Kibana, and Grafana to monitor data drift, anomalies, and system health.
  • Collaborated with architects, software engineers, and data scientists to align data infrastructure with Agentic AI system requirements and business objectives.
  • Led DevOps practices for containerized AI platforms using Kubernetes, Docker, Terraform, and CI/CD pipelines, with production-grade monitoring and observability.

Copart

Data Scientist Consultant
08.2023 - 06.2025

Job overview

  • Architected and developed Purple Fabric, a scalable, AI-native SaaS platform designed for high-value horizontal use cases (e.g., intelligent search, automated workflows).
  • Deployed Kubernetes-based platforms adaptable to both cloud and on-prem enterprise environments.
  • Designed and deployed scalable Deep Learning solutions for Natural Language Processing (NLP) and Generative AI applications.
  • Built LLM-based AI Guardrails for Prompt Injection Detection, Personally Identifiable Information (PII) masking, and Toxic Content Filtering using Meta LLaMA 2 70B, GPT-4o, SETFIT models and AWS Comprehend. Applied synthetic data generation via Gemma 2B for anonymization.
  • Engineered citation backfilling to validate and attribute references used in generative responses, increasing transparency and factuality.
  • Created a document classification and extraction model based on LayoutLM for automating categorization of emails and financial reports, improving efficiency by 75%.
  • Designed LLM benchmarking tools using LLM as Judge, ROUGE-L, fuzzy JSON, exact match scoring, and accuracy metrics.
  • Automated end-to-end error tracking and reporting workflows to monitor system anomalies across environments, enabling reduction of total errors from 6% to under 1% within a quarter, recognized by senior leadership for operational excellence.
  • Designed and implemented an AI-native support assistant leveraging LLMs to accelerate product issue resolution and enhance client support efficiency.
  • Designed scalable Kubernetes-based AI platforms for LLM inference, RAG pipelines, and agentic workflows.
  • Partnered with security teams to enforce enterprise AI governance, privacy controls, and compliance guardrails.

LTI Mindtree

Data Scientist
01.2022 - 11.2022

Job overview

  • Automated Infrastructure as Code (IaC) solutions using Terraform for AWS resource provisioning (EC2, S3, IAM, Lambda, RDS).
  • Built cloud-native ETL pipelines using Python, SQL, AWS, and Azure for automated data ingestion, transformation, and quality validation.
  • Designed data models and feature pipelines to support machine learning, analytics, and reporting workloads.
  • Implemented Infrastructure as Code (Terraform) for provisioning secure, scalable cloud data platforms (EC2, S3, IAM, Lambda, RDS).
  • Performed exploratory data analysis (EDA) to identify trends, correlations, and data quality issues for business stakeholders.
  • Developed and deployed ML pipelines for classification, regression, and forecasting, including monitoring and versioning using MLflow and Git.
  • Created data visualizations and dashboards using Tableau / Power BI to communicate insights to technical and non-technical stakeholders.
  • Collaborated cross-functionally with engineers, architects, and analysts to deliver enterprise-grade data and AI solutions.

Education

Lindsey Wilson University

Masters in Technology Management from Data science
01.2024

University Overview

GPA: 3.30/5.00

Vellore Institute of Technology University
Amaravati

B.B.A from Business Administration
01.2022

University Overview

  • Relevant Coursework: Data Systems, Data Algorythms, Linear algebra, Object Oriented Programming, Databases Management, Discrete Mathematics, Operating Systems, Computer Networks, Machine Learning, Data Mining, Cloud computing
  • GPA: 8.00 / 10

Skills

  • Programming Languages & Frameworks: Python (NumPy, Pandas, Scikit-learn, Keras, matplotlib, Flask, FastAPI), PyTorch, TensorFlow
  • Databases: SQL, MongoDB
  • Artificial Intelligence & Machine Learning: Machine Learning: Supervised Learning (Decision Trees, Random Forests), Unsupervised Learning, Data Modeling & Evaluation, Preprocessing & Postprocessing, Model Optimization & Performance Tuning Deep Learning: Convolutional Neural Networks (CNNs), Long ShortTerm Memory (LSTM), Gated Recurrent Units (GRU) Gen AI: Prompt Engineering, LLM Fine-Tuning, Retrieval-Augmented Generation (RAG), ReAct
  • LLM Experience: Meta LLaMA 2, Google Gemma, OpenAI GPT-4, Anthropic Claude
  • Cloud & DevOps: Amazon Web Services (EC2, S3, IAM, Lambda, RDS, Sagemaker), Google Vertexai, Docker, Kubernetes, Jenkins, Git, Postman, Kubernetes (OpenShift-compatible), Docker Helm, Terraform, Jenkins, CI/CD Pipelines, Infrastructure as Code (IaC), On-Prem & Hybrid Kubernetes Environments
  • Tools: Visual Studio code, Jupyter Notebook, CVAT, PuTTY, Tableau, Langfuse, Mlflow, Kibana, Grafana, Jira, Azure Devops ADO & confluence for project management
  • Machine learning
  • Natural language processing
  • Feature engineering
  • Model development
  • Clustering algorithms
  • Random forests
  • Decision trees
  • Transfer learning
  • Data analytics
  • Data mining
  • Statistical modeling
  • Dimensionality reduction
  • Reinforcement learning
  • Support vector machines

LEADERSHIP & AWARDS

Tech club head, Lindsey Wilson, TedxVITAP Org, Event Manager, Null Chapter Club – VITAP, Member, Bulls and Bears Finance Club, Volunteer, Intellect Fest 2023

Languages

English
Full Professional
Spanish
Professional Working
Hindi
Full Professional

Timeline

Machine Learning Engineer

Upbound INC
07.2025 - Current

Data Scientist Consultant

Copart
08.2023 - 06.2025

Data Scientist

LTI Mindtree
01.2022 - 11.2022

Vellore Institute of Technology University

B.B.A from Business Administration

Lindsey Wilson University

Masters in Technology Management from Data science
Arul JC