Summary

Overview

Work History

Education

Skills

Certification

Timeline

Surendra Sagi

Cleveland,USA

Summary

Results-driven AI/ML Developer and Python Developer with over 10 years of experience in designing, developing, and deploying enterprise-grade AI/ML, data engineering, and cloud-based applications across manufacturing, finance, healthcare, and retail domains.
Applied prompt engineering, dynamic chaining, and agent-based architectures using LangChain to enhance response accuracy and explainability.
Proficient in integrating LLMs (OpenAI, Hugging Face) into production using FastAPI and Flask, with hands-on experience in MLOps, CI/CD, and model monitoring.
Experienced AI/ML Developer skilled in building, training, and deploying machine learning and deep learning models using Python, PyTorch, TensorFlow, and AWS.
Experienced in designing and deploying NLP pipelines for sentiment analysis, document summarization, named entity recognition (NER), and translation using Hugging Face Transformers, spaCy, and GPT-based architectures.
Hands-on experience in computer vision, building and deploying object detection, OCR, and image segmentation solutions using OpenCV, TensorFlow, and Keras for real-time visual inference.
Highly skilled in Python development for data manipulation, transformation, and analytics using Pandas, NumPy, Scikit-learn, PySpark, and Matplotlib, with proven success in building scalable ETL pipelines on Azure Databricks and AWS Glue to process multi-terabyte datasets via batch and streaming ingestion.
Experienced in developing and deploying GenAI and machine learning solutions, including LLMs, Retrieval-Augmented Generation (RAG), LangChain orchestration, and Hugging Face Transformer fine-tuning, to deliver predictive intelligence and automate business workflows.
Strong expertise in Delta Lake architecture (bronze–silver–gold) for ACID transactions, schema evolution, and query optimization, as well as in metadata-driven, incremental data pipelines integrating structured and unstructured data from diverse sources into Azure Data Lake Gen2, Synapse Analytics, Redshift, and HDFS.
Demonstrated ability to optimize Spark performance through partitioning, caching, and adaptive query execution, achieving up to 60% improvement in compute efficiency, and faster SLA adherence.
Proficient in implementing CI/CD pipelines using Azure DevOps, Jenkins, Terraform, Docker, and Kubernetes, and in establishing data validation frameworks with PySpark and Great Expectations to ensure data accuracy and integrity.
Extensive experience leveraging Azure (ADF, Synapse, Event Hubs, Purview, Delta Live Tables), and AWS (S3, EMR, Redshift, Kinesis, Lambda, CloudFormation, CloudWatch) for secure, compliant, and cost-optimized data ecosystems.
Skilled in using AI-assisted development tools, such as GitHub Copilot, for intelligent code generation, testing, and documentation, enhancing development efficiency and quality assurance.
Experienced in performing advanced analytics and visualization with Power BI, Tableau, and Excel (DAX, SQL) to deliver actionable business insights and KPI dashboards for stakeholders. Recognized for strong collaboration with cross-functional ML, DevOps, and product teams in Agile/Scrum environments, leading cloud migration, data governance, and compliance initiatives (GDPR, PII masking) to build reliable, scalable, and insight-driven enterprise data solutions.

Overview

years of professional experience

Certification

Work History

AI/ML Engineer

Hubbell Incorporated

Cleveland, USA

08.2023 - Current

Designed and deployed Retrieval-Augmented Generation (RAG) pipelines using OpenAI GPT-4, LangChain, and FAISS, delivering real-time insights from structured and unstructured datasets.
Designed, developed, and deployed end-to-end machine learning and deep learning models using Python, TensorFlow, PyTorch, and XGBoost for classification and clustering problems.
Fine-tuned LLMs using Hugging Face Transformers and PyTorch, improving model precision and reducing manual response time by 35%.
Designed and deployed NLP pipelines for text classification, sentiment analysis, and summarization using Hugging Face Transformers, improving text understanding accuracy by 30%
Automated ETL workflows using PySpark, Pandas, and NumPy, transforming multi-source data for ML model consumption and analytics.
Designed and trained Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) using Keras and TensorFlow, achieving >90% accuracy on image and sequence classification tasks.
Built LLM-powered data retrieval pipelines using LangChain, enabling contextual insights from enterprise data stored in S3 and Redshift via Bedrock API orchestration.
Built and deployed object detection and image segmentation models using OpenCV, Keras, and TensorFlow, enabling automated defect detection in manufacturing images
Built scalable and modular ML pipelines leveraging SageMaker Processing, Training, and Model Registry for automated retraining and deployment.
Built metadata-driven ETL frameworks in Python using configuration tables in Databricks SQL, reducing manual maintenance and enabling reuse across 100+ data sources.
Collaborated with business stakeholders through Dataiku dashboards and insights, providing explainable model outputs and actionable recommendations.
Developed modular MLOps pipelines on AWS for model training, validation, deployment, and monitoring using CI/CD workflows, Docker, and SageMaker endpoints.
Built streamlined MLOps pipelines connecting LangChain-based RAG systems with AWS Bedrock, Glue, and Step Functions for production-ready AI integration.
Built interactive analytics models and dashboards in Tableau, Power BI, and Databricks SQL, enabling self-service BI and operational insights.
Collaborated with ML, DevOps, and compliance teams in an Agile environment (JIRA), delivering scalable AI-first solutions within strict enterprise and security standards.
Optimized model performance through hyperparameter tuning, feature engineering, and GPU acceleration, reducing training time and improving accuracy.

Environment: Python, PySpark, Pandas, NumPy, Delta Lake, Azure Databricks, NLP, Event Hubs, Delta Live Tables, Unity Catalog, XGBoost, Power BI, SQL, Node.js, Express.js, React, Redux, GraphQL, Hugging Face Transformers, PyTorch, LangChain, GPT-4, FAISS, Docker, Kubernetes, Terraform, Jenkins, Azure DevOps, Great Expectations, Boto3, AWS CloudFormation, Celery, Redi, AWS Sagemaker, Keras.

Senior Python Full Stack Developer (AI/ML)

Thomson Reuters

Eagan, USA

08.2021 - 07.2023

Designed front end and backend of the application using Python on Django Web Framework and AngularJS developed consumer-based features and applications using Python and Django in test driven Development and pair-based programming.
Designed, trained, and deployed end-to-end machine learning models in Python using AWS SageMaker, optimizing model accuracy and inference latency for production workloads.
Developing dynamic web pages using HTML5, CSS3, Bootstrap, SASS and JavaScript.
Working with Python for data manipulation, wrangling, and analysis using libraries such as Pandas, NumPy, Scikit-learn, and Matplotlib.
AWS services such as Amazon EMR, Redshift, and S3 were assessed from end to end in terms of architecture and execution.
Developed web applications and RESTful web services and APIs using Python Flask, Django, Pyramid, and PHP.
Led a cross-functional team in migrating legacy systems to the cloud, focusing on Azure Databricks for its scalability and Python-friendly environment.
Responsible in Working with various Python integrated development environments like PyCharm, Idle.
Automated the existing scripts for performance calculations using NumPy and SQL Alchemy.
Develop the scripts using Perl, Python, Unix and SQL.
Involved in Unit testing and Integration testing of the code using PyTest.
Writing a migration script from PostgreSQL to MongoDB with Python, using Gevent, pyscopg2python library, Postgres Cursors, and mongo Bulk insert.
Implemented AI-driven ETL scripts in Python to automate data extraction, transformation, and loading, increasing efficiency and accuracy.
Actively pursued and applied the latest advancements in Azure Databricks and Python libraries, fostering a culture of continuous improvement within the team.
Used AWS including deploying new server instances through automation with Kubernetes and Jenkins.
Developed using Git and Jira for code tracking and reviewing process.
Used Python and Django creating graphics, XML processing, data exchange and business logic implementation with Spiff workflow development.
Part of the team implementing REST API in Python using micro-framework like Flask with SQL Alchemy in the backend for management of data center resources.
Developed the ETL scripts in Python to get data from one database table and insert, update the resultant data to another database table.

Environment: Python 3.11, Django 4X, REST API, MySQL, Linux, CI/CD, GitHub, PyCharm, AWS, Jenkins Python, AWS, EC2, EBS, S3, VPC, PyCharm, Selenium IDE, jQuery, HTML, CSS, JavaScript, Ajax, Web Services, Pandas, JSON, Angular.js, Bootstrap, Jinja, Flask, MongoDB, AWS SageMaker.

Data Engineer

Cybage Software Pvt. Ltd.

Hyderabad, India

01.2016 - 07.2019

Designed and maintained ETL pipelines using Hadoop ecosystem tools — including Hive, Pig, Sqoop, and MapReduce — to process and transform multi-terabyte clinical and patient data.
Developed Spark (Scala and PySpark) jobs for distributed data processing, significantly improving data throughput and reducing batch latency by 40%.
Created and tuned Hive tables and UDFs for ad-hoc analytics, supporting partitioned and bucketed datasets optimized for downstream BI workloads.
Integrated NoSQL databases (MongoDB, HBase) with Hadoop and Spark for unstructured data ingestion and near-real-time analytics.
Optimized Pig workflows and Hive queries, improving query execution times through partition pruning, compression, and parallel processing.
Automated data ingestion from relational systems (Oracle 11g, SQL Server, Netezza) into HDFS using Sqoop and shell scripts, ensuring incremental and full-load consistency.
Designed PL/SQL procedures and stored packages to generate operational and customer status reports for healthcare clients.
Contributed to data warehouse design and SSAS cube development, enhancing multidimensional reporting and OLAP performance through partition and aggregation strategies.
Supported production Ab Initio jobs, resolving ETL failures, optimizing data dependencies, and ensuring SLA adherence.
Collaborated on cloud migration pilots to AWS, using S3, EMR, and Redshift to modernize data storage and analytics infrastructure for scalability and cost efficiency.

Environment: Hadoop Ecosystem (Hive, Pig, Sqoop, MapReduce), Spark (Scala, PySpark), Oracle 11g, PL/SQL, SQL Server, Netezza, MongoDB, HBase, Ab Initio, AWS (S3, EMR, Redshift), Power BI, Tableau, SSAS, Shell Scripting, Linux, Git, JIRA

Data Analyst

Indium Software Pvt. Ltd.

Chennai, India

06.2013 - 12.2015

Designed and developed 50+ Tableau and Power BI dashboards to track KPIs across sales, marketing, and operations, improving executive decision-making and performance visibility.
Performed data mining and analysis using SQL, Excel, and Python, driving actionable insights for customer segmentation and targeted marketing campaigns.
Automated data extraction and cleansing processes using Python scripts and shell automation, reducing manual reporting time by over 60%.
Created and optimized dimensional data models (Star and Snowflake schemas) to support scalable reporting and analytics.
Built Excel VBA macros and advanced formulas to automate recurring reporting tasks and data reconciliations across departments.
Developed interactive Tableau dashboards with parameters, trend analysis, and forecasting models to identify seasonal sales patterns and improve revenue predictions.
Integrated data from multiple SQL sources, CRM systems, and Google Analytics for consolidated customer behavior and campaign performance analysis.
Defined and implemented custom KPIs and DAX measures in Power BI, enabling teams to monitor conversion rates, churn, and product performance in real time.
Collaborated with business users to document reporting requirements, validate data accuracy, and enhance visualization usability through intuitive designs and tooltips.
Conducted A/B testing and statistical analysis in R and Python, providing insights into the effectiveness of marketing and promotional strategies.

Environment: SQL Server, Tableau, Power BI, Excel (VBA, Macros), Python 3.x, R, Shell Scripting, Google Analytics, CRM Systems, Star & Snowflake Schema Modeling, SSRS, Power Query, DAX

Education

Master of Science - Business Analytics and Information Systems

University of South Florida

Tampa, FL

01.2021

Bachelor of Technology - Electronics and Communication Engineering

Jawaharlal Nehru Technological University

Hyderabad, India

01.2013

Skills

Python
SQL
JavaScript
Scala
Shell Scripting
Pandas
NumPy
PySpark
Scikit-learn
Matplotlib
Great Expectations
LangChain
Hugging Face Transformers
TensorFlow
PyTorch
Large Language Models (LLMs)
Retrieval-Augmented Generation (RAG)
Prompt Engineering
Model Fine-Tuning
Generative AI Pipelines
SageMaker
Bedrock
Azure Data Factory (ADF)
Databricks
AWS Glue
Apache Spark
Hive
Sqoop
Kafka
Structured Streaming
Delta Live Tables
Azure: Data Lake Gen2
Synapse Analytics
Event Hubs
Purview
Delta Lake
AWS: S3
EMR
Redshift
Kinesis
Lambda
CloudFormation
CloudWatch
SQL Server
PostgreSQL
MySQL
MongoDB

HDFS
Oracle 11g
NoSQL Databases
Star & Snowflake Schemas
Dimensional Modeling
Parquet
Avro
ORC
RC
JSON
CSV
Azure DevOps
Jenkins
GitHub
Terraform
Docker
Kubernetes
Databricks CLI
Flask
Django
Nodejs
Expressjs
REST APIs
React
Redux
Angular
HTML5
CSS3
Power BI
Tableau
Excel (Pivot Tables, DAX, Power Query)
SSRS
Git
JIRA
Confluence
Agile / Scrum
GitHub Copilot
Celery
Redis
Bash / Shell Automation
PowerShell
Azure Purview
Unity Catalog
IAM
PII Masking
GDPR Compliance
Role-Based Access Control (RBAC)

Certification

Certified Python Developer (PCAP)
Microsoft Certified: Python for Data Science
Azure AI Fundamentals
AWS Certifications

Timeline

AI/ML Engineer

Hubbell Incorporated

08.2023 - Current

Senior Python Full Stack Developer (AI/ML)

Thomson Reuters

08.2021 - 07.2023

Data Engineer

Cybage Software Pvt. Ltd.

01.2016 - 07.2019

Data Analyst

Indium Software Pvt. Ltd.

06.2013 - 12.2015

Master of Science - Business Analytics and Information Systems

University of South Florida

Bachelor of Technology - Electronics and Communication Engineering

Jawaharlal Nehru Technological University

Surendra Sagi

Summary

Overview

Work History

AI/ML Engineer

Senior Python Full Stack Developer (AI/ML)

Data Engineer

Data Analyst

Education

Master of Science - Business Analytics and Information Systems

Bachelor of Technology - Electronics and Communication Engineering

Skills

Certification

Timeline

AI/ML Engineer

Senior Python Full Stack Developer (AI/ML)

Data Engineer

Data Analyst

Master of Science - Business Analytics and Information Systems

Bachelor of Technology - Electronics and Communication Engineering

Similar Profiles

Nicholas RiceNicholas Rice

George MunozGeorge Munoz

Raul RamirezRaul Ramirez

Jorge A. ColonJorge A. Colon

Amanda G. BrowerAmanda G. Brower