Summary

Overview

Work History

Education

Skills

Timeline

Mani Kanta Reddy Duggempudi

Greensboro,NC

Summary

Professional Senior Data Scientist with 9+ years of experience specializing in product analytics, web analytics, customer experience analytics, and data-driven decision-making. Proven expertise in leveraging SQL, big data technologies (Redshift, Spark, Hive, Big Query), and BI tools (Tableau, Quick, Power BI, Dash) to analyze complex datasets and derive actionable insights. Intelligent Systems Engineer with expertise in developing optimized systems. Consistently monitors and optimizes spending during engineering projects. Eager to work alongside customers to develop realistic solutions within prescribed budgets and timeframes.

Overview

years of professional experience

Work History

Senior Data Scientist

Zurich Insurance

Addison, TX

10.2023 - Current

Conducted extensive data profiling to understand patterns in student performance on USMLE examinations, utilizing advanced analytics techniques.
Applied machine learning models using cross-validation, log loss functions, ROC curves, and AUC to enhance feature selection and model accuracy.
Implemented regularization methods, like L2 and L1, to address overfitting in predictive models.
Utilized XGBoost, a machine learning software package, to perform statistical modeling and predict probabilities using Python.
Integrated data from various sources to create a comprehensive master dataset for modeling, including fields derived from client data, student essays, LORs, and performance metrics.
Employed grid search and k-fold cross-validation techniques to optimize hyperparameters and enhance model performance.
Developed predictive models using boosting algorithms to analyze the behavior of students taking the USMLE exams for residency applications.
Utilized Python libraries such as Numpy, Scikit-learn, pandas, NLTK, and Matplotlib to build and visualize models demonstrating student performance by demographics.
Applied a range of AI and machine learning algorithms for decision trees, text analytics, NLP, supervised and unsupervised learning, and regression models.
Performed feature engineering using Principal Component Analysis to manage high-dimensional data effectively.
Executed comprehensive data cleaning, feature scaling, and feature engineering using Pandas and Numpy in Python.
Designed and trained deep learning models using TensorFlow and Keras to predict residency attainment based on normalized scores from various tests.
Used the XGB Classifier for categorical variables and the XGB Regressor for continuous variables, integrating them with Feature Union and Function Transformer methods in NLP.
Implemented a One-vs-Rest classifier to address multi-class classification problems, fitting each classifier against all others.
Created various machine learning and statistical models, including Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression, to evaluate model accuracy.
Designed reports leveraging collected metrics to draw conclusions about past behaviors and predict future behaviors.
Generated multiple models using different machine learning and deep learning frameworks, selecting and tuning the highest-performing model using Signal Hub.
Developed data layers in Signal Hub to predict new, unseen data, ensuring performance at least equivalent to static models built with deep learning frameworks.
Integrated and managed large datasets using Big Data technologies, like Apache Spark and Hadoop, for real-time analytics and data processing.
Enhanced data visualization and interpretation capabilities using tools like Tableau, Power BI, and Google Data Studio.
Collaborated with cross-functional teams to implement scalable machine learning solutions and integrate them into business processes.
Spearheaded the migration of data analytics platforms to cloud environments such as AWS, Azure, and Google Cloud, optimizing the computational efficiency and scalability.
Conducted advanced text analytics and natural language processing to extract insights from unstructured data, improving decision-making processes.
Led training sessions and workshops on new AI/ML technologies and methodologies for internal teams, enhancing the organization's data literacy.
Maintained up-to-date knowledge of industry trends and advancements in AI and ML, applying this knowledge to drive innovation within the company.
Environment: Python 2.x, 3.x, Hive, AWS, Linux, Tableau Desktop, Microsoft Excel, NLP, deep learning frameworks such as TensorFlow, Keras, boosting algorithms, etc.
Developed predictive models using machine learning, natural language, and statistical analysis methods.

Data Scientist

Edward Jones

St. Louis, MO

07.2021 - 09.2023

Conducted data profiling to understand behaviors associated with traffic patterns, location, and timing, using advanced analytical techniques.
Applied a variety of AI/machine learning algorithms and statistical models, including decision trees, NLP, regression models, and neural networks, using Python's scikit-learn and MATLAB.
Utilized Apache Spark, Snowflake, and Scala for big data processing, significantly enhancing data analysis capabilities.
Managed and optimized Hadoop ecosystems, and developed applications in PySpark, integrating with data lakes, and using TensorFlow for deep learning tasks.
Designed and implemented data processing pipelines using Spark Streaming, Kafka, and AWS Kinesis to handle real-time data streams effectively.
Developed C# applications to connect SQL engines with databases, enhancing data accessibility and manipulation.
Engineered and maintained API libraries and business logic in C#, leveraging XML, and Python for backend processing.
Orchestrated data workflows using Apache Airflow, ensuring robust automation and monitoring of DAGs and their dependencies.
Performed data cleaning and feature engineering using MLlib in PySpark, focusing on optimizing machine learning models.
Developed regex-based data classification and extraction tools using Spark/Scala and R, operating within both Linux and Windows environments.
Created user interfaces with C#, JSP, and XML, featuring expandable menus for detailed data drill-downs on interactive graphs.
Evaluated predictive models using cross-validation, ROC curves, and AUC metrics to ensure accuracy and effectiveness in feature selection.
Conducted sentiment analysis and text analytics to categorize user comments from social media into positive and negative sentiments.
Monitored operational metrics using sensor data and Airflow to ensure processes met predefined criteria.
Managed extensive data mapping projects from various source systems to Teradata, utilizing tools like TPump and BTEQ.
Analyzed traffic patterns and temporal correlations using autocorrelation methods to predict future behaviors.
Implemented regularization methods, such as L2 and L1, to address model overfitting and improve generalization.
Employed Principal Component Analysis for reducing dimensionality in high-dimensional datasets, enhancing model performance.
Developed delivery time prediction models using Multinomial Logistic Regression and Random Forest, optimizing logistics routes.
Performed comprehensive data extraction and transformation using SQL, Hive, and ETL processes to support data analytics.
Created and maintained data quality scripts in SQL and Hive, ensuring high standards of data integrity and reliability.
Developed data visualizations using Python and tools like Tableau and Spotfire to communicate insights to stakeholders effectively.
Collaborated with various departments to gather data requirements and needs, facilitating cross-functional data initiatives.
Environment: Python 2.x, CDH5, HDFS, C#, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Data Scientist

Albertsons

Arlington, TX

10.2019 - 06.2021

Led the design, development, and support phases of the Software Development Life Cycle (SDLC) for AI/ML projects
Engineered and optimized ETL processes using SQL Server Integration Services (SSIS) to handle large-scale data from diverse sources
Developed robust data pipelines and architectures in Python, utilizing Apache Airflow and Databricks for enhanced data integration and workflow management
Implemented machine learning models using TensorFlow and PyTorch, improving predictive accuracy and performance
Conducted advanced statistical analysis and hypothesis testing using R and Python to validate model assumptions and results
Utilized AWS Sagemaker for deploying and scaling machine learning models efficiently in a production environment
Orchestrated data extraction and manipulation tasks from MongoDB using the MongoDB Connector for Hadoop
Performed data cleaning, preprocessing, and feature engineering to prepare data for complex model training
Applied a variety of machine learning techniques including Decision Trees, Naive Bayes, and Logistic Regression to solve business problems
Executed text mining and sentiment analysis on large datasets to derive customer insights and inform business strategies
Led a team in the development of a hybrid AI model that significantly enhanced predictive accuracy for business applications
Developed and maintained interactive dashboards and reports using Tableau and Microsoft Power BI for real-time data visualization and decision support
Collaborated with cross-functional teams to define data requirements and align machine learning projects with organizational goals
Performed cluster analysis using k-means and hierarchical clustering to segment data and uncover patterns in complex datasets
Designed and executed A/B tests to evaluate the performance of different machine learning models and strategies
Utilized Apache Spark and MLlib for scalable data processing and machine learning tasks on big data platforms
Engaged in continuous learning to stay current with advancements in AI, machine learning algorithms, and computational techniques
Provided mentorship and training to junior data scientists and analysts, enhancing team capabilities and knowledge
Documented model development processes, results, and business impacts comprehensively for stakeholder reporting
Advocated for the ethical use of AI and machine learning technologies, ensuring compliance with data privacy regulations and standards
Environment: Python, PySpark, C#, Tableau, MongoDB, Hadoop, SQL Server, SDLC, ETL, SSIS, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test

Data Scientist

Juspay Technologies

Bangalore, India

02.2017 - 07.2019

Involved in the design, development, and support phases of the Software Development Life Cycle (SDLC). Performed data ETL by collecting, exporting, merging, and massaging data from multiple sources and platforms, including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
Programming experience with the .NET framework, C#, and Visual Studio 2005/2008 to build web-based, client/server architecture and to produce reports with C# and JSP.
Worked with cross-functional teams (including the data engineering team) to extract data and rapidly execute from MongoDB through the MongoDB connector for Hadoop.
Performed data cleaning and feature selection using the MLlib package in PySpark.
Performed partitional clustering into 100 clusters using k-means clustering with the Scikit-learn package in Python, where similar hotels for a search are grouped.
Used Python to perform an ANOVA test to analyze the differences among hotel clusters.
Implemented the application of various machine learning algorithms and statistical modeling, such as Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression, using Python to determine the accuracy rate of each model.
Determine the most accurate prediction model based on the accuracy rate.
Used a text-mining process of reviews to determine customers' concentrations.
Delivered analysis support for hotel recommendations and provided an online A/B test.
Designed Tableau bar graphs, scatter plots, and geographical maps to create detailed-level summary reports and dashboards.
Developed a hybrid model to improve the accuracy rate.
Environment: SSIS, Visual Studio, MongoDB Connector for Hadoop, PySpark, Scikit-learn, Python, MLlib, and Tableau.

Data Scientist

High Radius Technologies

Hyderabad, India

11.2015 - 01.2017

Participated in all phases of research, including data collection, data cleaning, data mining, developing models, and visualizations.
Collaborated with data engineers and the operations team to collect data from the internal system to fit the analytical requirements.
Redefined many attributes and relationships, and cleansed unwanted tables/columns using SQL queries.
Utilized the Spark SQL API in PySpark to extract and load data, perform SQL queries. and also used the C# connector to perform SQL queries by creating and connecting to the SQL engine.
Performed data imputation using the Scikit-learn package in Python.
Performed data processing using Python libraries like NumPy and Pandas.
Worked with data analysis using the ggplot2 library in R to create data visualizations for a better understanding of customers' behaviors.
Implemented statistical modeling with the XGBoost machine learning software package using R to determine the predicted probabilities of each model.
Delivered the results with the operations team for better decisions.
Environment: SQL, PySpark, Scikit-learn, NumPy, Pandas, ggplot2, and XGBoost.

Education

Bachelor of Technology (B.Tech) - Information Technology

JNTUH

01.2015

Skills

Python

R
Java
Scala
SQL
MATLAB
Big Data
AWS SDK
TensorFlow
PyTorch
Nubby
Scaly
Pandas
Matplotlib
Tableau
Power BI
SAP BusinessObjects BI
AWS Glue
Azure Data Factory
Agile
Scrum
Lean Six Sigma
Kanban
DevOps
Google Data Studio
Databricks
Apache Superset

Oracle
Microsoft SQL Server
PostgreSQL
MongoDB
Amazon Redshift
Google BigQuery
Snowflake
Linux
UNIX
Windows
AWS
Azure
Google Cloud
Scikit-learn
H2O
Fastai
XGBoost
Deep Learning 4JJ
NLTK
SpaCy
GPT
BERT
Jenkins
GitHub Actions
GitLab CI/CD
Docker

Kubernetes

Timeline

Senior Data Scientist

Zurich Insurance

10.2023 - Current

Data Scientist

Edward Jones

07.2021 - 09.2023

Data Scientist

Albertsons

10.2019 - 06.2021

Data Scientist

Juspay Technologies

02.2017 - 07.2019

Data Scientist

High Radius Technologies

11.2015 - 01.2017

Bachelor of Technology (B.Tech) - Information Technology

JNTUH

Mani Kanta Reddy Duggempudi

Summary

Overview

Work History

Senior Data Scientist

Data Scientist

Data Scientist

Data Scientist

Data Scientist

Education

Bachelor of Technology (B.Tech) - Information Technology

Skills

Timeline

Senior Data Scientist

Data Scientist

Data Scientist

Data Scientist

Data Scientist

Bachelor of Technology (B.Tech) - Information Technology

Similar Profiles

Melwyn SampsonMelwyn Sampson

Christopher HamiltonChristopher Hamilton

Rosline PhilipRosline Philip