Minimized equipment downtime by 15% using Isolation Forest models to forecast pump failures from SCADA vibration and temperature data (1M+ daily readings).
Lowered false alarms by 40% with Random Forest classifier (Python/scikit-learn), achieving a precision-recall AUC of 0.92 for critical alert prioritization.
Streamlined data ingestion by 25% and eliminating 15,000+ monthly duplicates by optimizing Azure Data Explorer and T-SQL queries across 1M+ sensor records.
Designed Tableau dashboards for real-time equipment health monitoring, improving operational decision-making.
Client: Baltimore Gas and Electric
Built real-time Power BI dashboards to monitor 500+ substation assets, reducing incident response time by 25%.
Automated PI/SQL/SharePoint data using Pandas and PySpark pipelines saving 15hrs/week in manual processing.
Extended transformer lifespan by 18% through ARIMA and LSTM forecasting (Python/statsmodels) of oil temperature and load cycles.
Boosted reporting speed by 40% with Power BI star schema and DAX query optimization for 500K+ records.
Client: XPO, Inc
Enhanced document accuracy by 20% using spaCy NLP pipeline (Python) to extract 50+ fields from unstructured Bills of Lading.
Decreased shipment errors by 25% with Logistic Regression (Python/scikit-learn) for confidence scoring of weight and destination fields.
Client: Internal Project
Diagnosed sensor drift in 100K+ time-series records with Autoencoder models (Python/TensorFlow), leading to a 20% surge in anomaly detection accuracy and streamlining maintenance operations, with ~100+ lines of code.
Refined forecasting accuracy by 20% with ARIMA and STL decomposition (Python/statsmodels) for seasonal time-series data.
Accelerated ETL workflows by 30% through Python workflows (Pandas/pyodbc) for data pipeline automation.
Data Science Intern
MEGHA AI
01.2023 - 05.2023
Engineered Autoencoder model in Python/TensorFlow that detected sensor drift patterns, elevating anomaly detection accuracy by 20% and reducing false positives by 15% within the manufacturing process.
Optimized time-series forecasting by 20% with ARIMA and STL decomposition (statsmodels).
Shortened ETL processing time 30% by orchestrating through Pandas and pyodbc workflows for pipeline optimization.
Senior Software Engineer
HEXAGON ASSET LIFECYCLE INTELLIGENCE
11.2018 - 01.2022
Improved maintenance efficiency by 25% using Gradient Boosting models (Python/scikit-learn) to predict bearing wear from vibration data.
Scaled analytics to 1B+ rows with SQL Server columnstore indexing, reducing query times from 12 minutes to 45 seconds.
Education
Master of Science - Data Science
UNIVERSITY AT BUFFALO
05.2023
Bachelor of Technology - Mechanical Engineering
GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND TECHNOLOGY
06.2018
Skills
Python
Pandas
Scikit-learn
TensorFlow
SQL
Query Tuning
Indexing
Anomaly Detection
Time Series Forecasting
NLP
Ensemble Methods
ETL/ELT
Azure Data Explorer
Big Data Processing
Power BI
DAX
Data Modeling
Tableau
Azure
Data Factory
DevOps
Docker
AWS
GCP
Languages
Python (Pandas, scikit-learn, TensorFlow)
SQL (Query Tuning, Indexing)
Timeline
Data Science Consultant
CAPGEMINI
10.2023 - Current
Data Science Intern
MEGHA AI
01.2023 - 05.2023
Senior Software Engineer
HEXAGON ASSET LIFECYCLE INTELLIGENCE
11.2018 - 01.2022
Bachelor of Technology - Mechanical Engineering
GOKARAJU RANGARAJU INSTITUTE OF ENGINEERING AND TECHNOLOGY