Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Surendra Sagi

Cleveland,USA

Summary

  • Results-driven AI/ML Developer and Python Developer with over 10 years of experience in designing, developing, and deploying enterprise-grade AI/ML, data engineering, and cloud-based applications across manufacturing, finance, healthcare, and retail domains.
  • Applied prompt engineering, dynamic chaining, and agent-based architectures using LangChain to enhance response accuracy and explainability.
  • Proficient in integrating LLMs (OpenAI, Hugging Face) into production using FastAPI and Flask, with hands-on experience in MLOps, CI/CD, and model monitoring.
  • Experienced AI/ML Developer skilled in building, training, and deploying machine learning and deep learning models using Python, PyTorch, TensorFlow, and AWS.
  • Experienced in designing and deploying NLP pipelines for sentiment analysis, document summarization, named entity recognition (NER), and translation using Hugging Face Transformers, spaCy, and GPT-based architectures.
  • Hands-on experience in computer vision, building and deploying object detection, OCR, and image segmentation solutions using OpenCV, TensorFlow, and Keras for real-time visual inference.
  • Highly skilled in Python development for data manipulation, transformation, and analytics using Pandas, NumPy, Scikit-learn, PySpark, and Matplotlib, with proven success in building scalable ETL pipelines on Azure Databricks and AWS Glue to process multi-terabyte datasets via batch and streaming ingestion.
  • Experienced in developing and deploying GenAI and machine learning solutions, including LLMs, Retrieval-Augmented Generation (RAG), LangChain orchestration, and Hugging Face Transformer fine-tuning, to deliver predictive intelligence and automate business workflows.
  • Strong expertise in Delta Lake architecture (bronze–silver–gold) for ACID transactions, schema evolution, and query optimization, as well as in metadata-driven, incremental data pipelines integrating structured and unstructured data from diverse sources into Azure Data Lake Gen2, Synapse Analytics, Redshift, and HDFS.
  • Demonstrated ability to optimize Spark performance through partitioning, caching, and adaptive query execution, achieving up to 60% improvement in compute efficiency, and faster SLA adherence.
  • Proficient in implementing CI/CD pipelines using Azure DevOps, Jenkins, Terraform, Docker, and Kubernetes, and in establishing data validation frameworks with PySpark and Great Expectations to ensure data accuracy and integrity.
  • Extensive experience leveraging Azure (ADF, Synapse, Event Hubs, Purview, Delta Live Tables), and AWS (S3, EMR, Redshift, Kinesis, Lambda, CloudFormation, CloudWatch) for secure, compliant, and cost-optimized data ecosystems.
  • Skilled in using AI-assisted development tools, such as GitHub Copilot, for intelligent code generation, testing, and documentation, enhancing development efficiency and quality assurance.
  • Experienced in performing advanced analytics and visualization with Power BI, Tableau, and Excel (DAX, SQL) to deliver actionable business insights and KPI dashboards for stakeholders. Recognized for strong collaboration with cross-functional ML, DevOps, and product teams in Agile/Scrum environments, leading cloud migration, data governance, and compliance initiatives (GDPR, PII masking) to build reliable, scalable, and insight-driven enterprise data solutions.

Overview

12
12
years of professional experience
1
1
Certification

Work History

AI/ML Engineer

Hubbell Incorporated
Cleveland, USA
08.2023 - Current
  • Designed and deployed Retrieval-Augmented Generation (RAG) pipelines using OpenAI GPT-4, LangChain, and FAISS, delivering real-time insights from structured and unstructured datasets.
  • Designed, developed, and deployed end-to-end machine learning and deep learning models using Python, TensorFlow, PyTorch, and XGBoost for classification and clustering problems.
  • Fine-tuned LLMs using Hugging Face Transformers and PyTorch, improving model precision and reducing manual response time by 35%.
  • Designed and deployed NLP pipelines for text classification, sentiment analysis, and summarization using Hugging Face Transformers, improving text understanding accuracy by 30%
  • Automated ETL workflows using PySpark, Pandas, and NumPy, transforming multi-source data for ML model consumption and analytics.
  • Designed and trained Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) using Keras and TensorFlow, achieving >90% accuracy on image and sequence classification tasks.
  • Built LLM-powered data retrieval pipelines using LangChain, enabling contextual insights from enterprise data stored in S3 and Redshift via Bedrock API orchestration.
  • Built and deployed object detection and image segmentation models using OpenCV, Keras, and TensorFlow, enabling automated defect detection in manufacturing images
  • Built scalable and modular ML pipelines leveraging SageMaker Processing, Training, and Model Registry for automated retraining and deployment.
  • Built metadata-driven ETL frameworks in Python using configuration tables in Databricks SQL, reducing manual maintenance and enabling reuse across 100+ data sources.
  • Collaborated with business stakeholders through Dataiku dashboards and insights, providing explainable model outputs and actionable recommendations.
  • Developed modular MLOps pipelines on AWS for model training, validation, deployment, and monitoring using CI/CD workflows, Docker, and SageMaker endpoints.
  • Built streamlined MLOps pipelines connecting LangChain-based RAG systems with AWS Bedrock, Glue, and Step Functions for production-ready AI integration.
  • Built interactive analytics models and dashboards in Tableau, Power BI, and Databricks SQL, enabling self-service BI and operational insights.
  • Collaborated with ML, DevOps, and compliance teams in an Agile environment (JIRA), delivering scalable AI-first solutions within strict enterprise and security standards.
  • Optimized model performance through hyperparameter tuning, feature engineering, and GPU acceleration, reducing training time and improving accuracy.

Environment: Python, PySpark, Pandas, NumPy, Delta Lake, Azure Databricks, NLP, Event Hubs, Delta Live Tables, Unity Catalog, XGBoost, Power BI, SQL, Node.js, Express.js, React, Redux, GraphQL, Hugging Face Transformers, PyTorch, LangChain, GPT-4, FAISS, Docker, Kubernetes, Terraform, Jenkins, Azure DevOps, Great Expectations, Boto3, AWS CloudFormation, Celery, Redi, AWS Sagemaker, Keras.

Senior Python Full Stack Developer (AI/ML)

Thomson Reuters
Eagan, USA
08.2021 - 07.2023
  • Designed front end and backend of the application using Python on Django Web Framework and AngularJS developed consumer-based features and applications using Python and Django in test driven Development and pair-based programming.
  • Designed, trained, and deployed end-to-end machine learning models in Python using AWS SageMaker, optimizing model accuracy and inference latency for production workloads.
  • Developing dynamic web pages using HTML5, CSS3, Bootstrap, SASS and JavaScript.
  • Working with Python for data manipulation, wrangling, and analysis using libraries such as Pandas, NumPy, Scikit-learn, and Matplotlib.
  • AWS services such as Amazon EMR, Redshift, and S3 were assessed from end to end in terms of architecture and execution.
  • Developed web applications and RESTful web services and APIs using Python Flask, Django, Pyramid, and PHP.
  • Led a cross-functional team in migrating legacy systems to the cloud, focusing on Azure Databricks for its scalability and Python-friendly environment.
  • Responsible in Working with various Python integrated development environments like PyCharm, Idle.
  • Automated the existing scripts for performance calculations using NumPy and SQL Alchemy.
  • Develop the scripts using Perl, Python, Unix and SQL.
  • Involved in Unit testing and Integration testing of the code using PyTest.
  • Writing a migration script from PostgreSQL to MongoDB with Python, using Gevent, pyscopg2python library, Postgres Cursors, and mongo Bulk insert.
  • Implemented AI-driven ETL scripts in Python to automate data extraction, transformation, and loading, increasing efficiency and accuracy.
  • Actively pursued and applied the latest advancements in Azure Databricks and Python libraries, fostering a culture of continuous improvement within the team.
  • Used AWS including deploying new server instances through automation with Kubernetes and Jenkins.
  • Developed using Git and Jira for code tracking and reviewing process.
  • Used Python and Django creating graphics, XML processing, data exchange and business logic implementation with Spiff workflow development.
  • Part of the team implementing REST API in Python using micro-framework like Flask with SQL Alchemy in the backend for management of data center resources.
  • Developed the ETL scripts in Python to get data from one database table and insert, update the resultant data to another database table.

Environment: Python 3.11, Django 4X, REST API, MySQL, Linux, CI/CD, GitHub, PyCharm, AWS, Jenkins Python, AWS, EC2, EBS, S3, VPC, PyCharm, Selenium IDE, jQuery, HTML, CSS, JavaScript, Ajax, Web Services, Pandas, JSON, Angular.js, Bootstrap, Jinja, Flask, MongoDB, AWS SageMaker.

Data Engineer

Cybage Software Pvt. Ltd.
Hyderabad, India
01.2016 - 07.2019
  • Designed and maintained ETL pipelines using Hadoop ecosystem tools — including Hive, Pig, Sqoop, and MapReduce — to process and transform multi-terabyte clinical and patient data.
  • Developed Spark (Scala and PySpark) jobs for distributed data processing, significantly improving data throughput and reducing batch latency by 40%.
  • Created and tuned Hive tables and UDFs for ad-hoc analytics, supporting partitioned and bucketed datasets optimized for downstream BI workloads.
  • Integrated NoSQL databases (MongoDB, HBase) with Hadoop and Spark for unstructured data ingestion and near-real-time analytics.
  • Optimized Pig workflows and Hive queries, improving query execution times through partition pruning, compression, and parallel processing.
  • Automated data ingestion from relational systems (Oracle 11g, SQL Server, Netezza) into HDFS using Sqoop and shell scripts, ensuring incremental and full-load consistency.
  • Designed PL/SQL procedures and stored packages to generate operational and customer status reports for healthcare clients.
  • Contributed to data warehouse design and SSAS cube development, enhancing multidimensional reporting and OLAP performance through partition and aggregation strategies.
  • Supported production Ab Initio jobs, resolving ETL failures, optimizing data dependencies, and ensuring SLA adherence.
  • Collaborated on cloud migration pilots to AWS, using S3, EMR, and Redshift to modernize data storage and analytics infrastructure for scalability and cost efficiency.

Environment: Hadoop Ecosystem (Hive, Pig, Sqoop, MapReduce), Spark (Scala, PySpark), Oracle 11g, PL/SQL, SQL Server, Netezza, MongoDB, HBase, Ab Initio, AWS (S3, EMR, Redshift), Power BI, Tableau, SSAS, Shell Scripting, Linux, Git, JIRA

Data Analyst

Indium Software Pvt. Ltd.
Chennai, India
06.2013 - 12.2015
  • Designed and developed 50+ Tableau and Power BI dashboards to track KPIs across sales, marketing, and operations, improving executive decision-making and performance visibility.
  • Performed data mining and analysis using SQL, Excel, and Python, driving actionable insights for customer segmentation and targeted marketing campaigns.
  • Automated data extraction and cleansing processes using Python scripts and shell automation, reducing manual reporting time by over 60%.
  • Created and optimized dimensional data models (Star and Snowflake schemas) to support scalable reporting and analytics.
  • Built Excel VBA macros and advanced formulas to automate recurring reporting tasks and data reconciliations across departments.
  • Developed interactive Tableau dashboards with parameters, trend analysis, and forecasting models to identify seasonal sales patterns and improve revenue predictions.
  • Integrated data from multiple SQL sources, CRM systems, and Google Analytics for consolidated customer behavior and campaign performance analysis.
  • Defined and implemented custom KPIs and DAX measures in Power BI, enabling teams to monitor conversion rates, churn, and product performance in real time.
  • Collaborated with business users to document reporting requirements, validate data accuracy, and enhance visualization usability through intuitive designs and tooltips.
  • Conducted A/B testing and statistical analysis in R and Python, providing insights into the effectiveness of marketing and promotional strategies.

Environment: SQL Server, Tableau, Power BI, Excel (VBA, Macros), Python 3.x, R, Shell Scripting, Google Analytics, CRM Systems, Star & Snowflake Schema Modeling, SSRS, Power Query, DAX

Education

Master of Science - Business Analytics and Information Systems

University of South Florida
Tampa, FL
01.2021

Bachelor of Technology - Electronics and Communication Engineering

Jawaharlal Nehru Technological University
Hyderabad, India
01.2013

Skills

  • Python
  • SQL
  • JavaScript
  • Scala
  • Shell Scripting
  • Pandas
  • NumPy
  • PySpark
  • Scikit-learn
  • Matplotlib
  • Great Expectations
  • LangChain
  • Hugging Face Transformers
  • TensorFlow
  • PyTorch
  • Large Language Models (LLMs)
  • Retrieval-Augmented Generation (RAG)
  • Prompt Engineering
  • Model Fine-Tuning
  • Generative AI Pipelines
  • SageMaker
  • Bedrock
  • Azure Data Factory (ADF)
  • Databricks
  • AWS Glue
  • Apache Spark
  • Hive
  • Sqoop
  • Kafka
  • Structured Streaming
  • Delta Live Tables
  • Azure: Data Lake Gen2
  • Synapse Analytics
  • Event Hubs
  • Purview
  • Delta Lake
  • AWS: S3
  • EMR
  • Redshift
  • Kinesis
  • Lambda
  • CloudFormation
  • CloudWatch
  • SQL Server
  • PostgreSQL
  • MySQL
  • MongoDB
  • HDFS
  • Oracle 11g
  • NoSQL Databases
  • Star & Snowflake Schemas
  • Dimensional Modeling
  • Parquet
  • Avro
  • ORC
  • RC
  • JSON
  • CSV
  • Azure DevOps
  • Jenkins
  • GitHub
  • Terraform
  • Docker
  • Kubernetes
  • Databricks CLI
  • Flask
  • Django
  • Nodejs
  • Expressjs
  • REST APIs
  • React
  • Redux
  • Angular
  • HTML5
  • CSS3
  • Power BI
  • Tableau
  • Excel (Pivot Tables, DAX, Power Query)
  • SSRS
  • Git
  • JIRA
  • Confluence
  • Agile / Scrum
  • GitHub Copilot
  • Celery
  • Redis
  • Bash / Shell Automation
  • PowerShell
  • Azure Purview
  • Unity Catalog
  • IAM
  • PII Masking
  • GDPR Compliance
  • Role-Based Access Control (RBAC)

Certification

  • Certified Python Developer (PCAP)
  • Microsoft Certified: Python for Data Science
  • Azure AI Fundamentals
  • AWS Certifications

Timeline

AI/ML Engineer

Hubbell Incorporated
08.2023 - Current

Senior Python Full Stack Developer (AI/ML)

Thomson Reuters
08.2021 - 07.2023

Data Engineer

Cybage Software Pvt. Ltd.
01.2016 - 07.2019

Data Analyst

Indium Software Pvt. Ltd.
06.2013 - 12.2015

Master of Science - Business Analytics and Information Systems

University of South Florida

Bachelor of Technology - Electronics and Communication Engineering

Jawaharlal Nehru Technological University
Surendra Sagi