Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mani Kanta Reddy Duggempudi

Greensboro,NC

Summary

Professional Senior Data Scientist with 9+ years of experience specializing in product analytics, web analytics, customer experience analytics, and data-driven decision-making. Proven expertise in leveraging SQL, big data technologies (Redshift, Spark, Hive, Big Query), and BI tools (Tableau, Quick, Power BI, Dash) to analyze complex datasets and derive actionable insights. Intelligent Systems Engineer with expertise in developing optimized systems. Consistently monitors and optimizes spending during engineering projects. Eager to work alongside customers to develop realistic solutions within prescribed budgets and timeframes.

Overview

9
9
years of professional experience

Work History

Senior Data Scientist

Zurich Insurance
Addison, TX
10.2023 - Current
  • Conducted extensive data profiling to understand patterns in student performance on USMLE examinations, utilizing advanced analytics techniques.
  • Applied machine learning models using cross-validation, log loss functions, ROC curves, and AUC to enhance feature selection and model accuracy.
  • Implemented regularization methods, like L2 and L1, to address overfitting in predictive models.
  • Utilized XGBoost, a machine learning software package, to perform statistical modeling and predict probabilities using Python.
  • Integrated data from various sources to create a comprehensive master dataset for modeling, including fields derived from client data, student essays, LORs, and performance metrics.
  • Employed grid search and k-fold cross-validation techniques to optimize hyperparameters and enhance model performance.
  • Developed predictive models using boosting algorithms to analyze the behavior of students taking the USMLE exams for residency applications.
  • Utilized Python libraries such as Numpy, Scikit-learn, pandas, NLTK, and Matplotlib to build and visualize models demonstrating student performance by demographics.
  • Applied a range of AI and machine learning algorithms for decision trees, text analytics, NLP, supervised and unsupervised learning, and regression models.
  • Performed feature engineering using Principal Component Analysis to manage high-dimensional data effectively.
  • Executed comprehensive data cleaning, feature scaling, and feature engineering using Pandas and Numpy in Python.
  • Designed and trained deep learning models using TensorFlow and Keras to predict residency attainment based on normalized scores from various tests.
  • Used the XGB Classifier for categorical variables and the XGB Regressor for continuous variables, integrating them with Feature Union and Function Transformer methods in NLP.
  • Implemented a One-vs-Rest classifier to address multi-class classification problems, fitting each classifier against all others.
  • Created various machine learning and statistical models, including Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression, to evaluate model accuracy.
  • Designed reports leveraging collected metrics to draw conclusions about past behaviors and predict future behaviors.
  • Generated multiple models using different machine learning and deep learning frameworks, selecting and tuning the highest-performing model using Signal Hub.
  • Developed data layers in Signal Hub to predict new, unseen data, ensuring performance at least equivalent to static models built with deep learning frameworks.
  • Integrated and managed large datasets using Big Data technologies, like Apache Spark and Hadoop, for real-time analytics and data processing.
  • Enhanced data visualization and interpretation capabilities using tools like Tableau, Power BI, and Google Data Studio.
  • Collaborated with cross-functional teams to implement scalable machine learning solutions and integrate them into business processes.
  • Spearheaded the migration of data analytics platforms to cloud environments such as AWS, Azure, and Google Cloud, optimizing the computational efficiency and scalability.
  • Conducted advanced text analytics and natural language processing to extract insights from unstructured data, improving decision-making processes.
  • Led training sessions and workshops on new AI/ML technologies and methodologies for internal teams, enhancing the organization's data literacy.
  • Maintained up-to-date knowledge of industry trends and advancements in AI and ML, applying this knowledge to drive innovation within the company.
  • Environment: Python 2.x, 3.x, Hive, AWS, Linux, Tableau Desktop, Microsoft Excel, NLP, deep learning frameworks such as TensorFlow, Keras, boosting algorithms, etc.
  • Developed predictive models using machine learning, natural language, and statistical analysis methods.

Data Scientist

Edward Jones
St. Louis, MO
07.2021 - 09.2023
  • Conducted data profiling to understand behaviors associated with traffic patterns, location, and timing, using advanced analytical techniques.
  • Applied a variety of AI/machine learning algorithms and statistical models, including decision trees, NLP, regression models, and neural networks, using Python's scikit-learn and MATLAB.
  • Utilized Apache Spark, Snowflake, and Scala for big data processing, significantly enhancing data analysis capabilities.
  • Managed and optimized Hadoop ecosystems, and developed applications in PySpark, integrating with data lakes, and using TensorFlow for deep learning tasks.
  • Designed and implemented data processing pipelines using Spark Streaming, Kafka, and AWS Kinesis to handle real-time data streams effectively.
  • Developed C# applications to connect SQL engines with databases, enhancing data accessibility and manipulation.
  • Engineered and maintained API libraries and business logic in C#, leveraging XML, and Python for backend processing.
  • Orchestrated data workflows using Apache Airflow, ensuring robust automation and monitoring of DAGs and their dependencies.
  • Performed data cleaning and feature engineering using MLlib in PySpark, focusing on optimizing machine learning models.
  • Developed regex-based data classification and extraction tools using Spark/Scala and R, operating within both Linux and Windows environments.
  • Created user interfaces with C#, JSP, and XML, featuring expandable menus for detailed data drill-downs on interactive graphs.
  • Evaluated predictive models using cross-validation, ROC curves, and AUC metrics to ensure accuracy and effectiveness in feature selection.
  • Conducted sentiment analysis and text analytics to categorize user comments from social media into positive and negative sentiments.
  • Monitored operational metrics using sensor data and Airflow to ensure processes met predefined criteria.
  • Managed extensive data mapping projects from various source systems to Teradata, utilizing tools like TPump and BTEQ.
  • Analyzed traffic patterns and temporal correlations using autocorrelation methods to predict future behaviors.
  • Implemented regularization methods, such as L2 and L1, to address model overfitting and improve generalization.
  • Employed Principal Component Analysis for reducing dimensionality in high-dimensional datasets, enhancing model performance.
  • Developed delivery time prediction models using Multinomial Logistic Regression and Random Forest, optimizing logistics routes.
  • Performed comprehensive data extraction and transformation using SQL, Hive, and ETL processes to support data analytics.
  • Created and maintained data quality scripts in SQL and Hive, ensuring high standards of data integrity and reliability.
  • Developed data visualizations using Python and tools like Tableau and Spotfire to communicate insights to stakeholders effectively.
  • Collaborated with various departments to gather data requirements and needs, facilitating cross-functional data initiatives.
  • Environment: Python 2.x, CDH5, HDFS, C#, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Data Scientist

Albertsons
Arlington, TX
10.2019 - 06.2021
  • Led the design, development, and support phases of the Software Development Life Cycle (SDLC) for AI/ML projects
  • Engineered and optimized ETL processes using SQL Server Integration Services (SSIS) to handle large-scale data from diverse sources
  • Developed robust data pipelines and architectures in Python, utilizing Apache Airflow and Databricks for enhanced data integration and workflow management
  • Implemented machine learning models using TensorFlow and PyTorch, improving predictive accuracy and performance
  • Conducted advanced statistical analysis and hypothesis testing using R and Python to validate model assumptions and results
  • Utilized AWS Sagemaker for deploying and scaling machine learning models efficiently in a production environment
  • Orchestrated data extraction and manipulation tasks from MongoDB using the MongoDB Connector for Hadoop
  • Performed data cleaning, preprocessing, and feature engineering to prepare data for complex model training
  • Applied a variety of machine learning techniques including Decision Trees, Naive Bayes, and Logistic Regression to solve business problems
  • Executed text mining and sentiment analysis on large datasets to derive customer insights and inform business strategies
  • Led a team in the development of a hybrid AI model that significantly enhanced predictive accuracy for business applications
  • Developed and maintained interactive dashboards and reports using Tableau and Microsoft Power BI for real-time data visualization and decision support
  • Collaborated with cross-functional teams to define data requirements and align machine learning projects with organizational goals
  • Performed cluster analysis using k-means and hierarchical clustering to segment data and uncover patterns in complex datasets
  • Designed and executed A/B tests to evaluate the performance of different machine learning models and strategies
  • Utilized Apache Spark and MLlib for scalable data processing and machine learning tasks on big data platforms
  • Engaged in continuous learning to stay current with advancements in AI, machine learning algorithms, and computational techniques
  • Provided mentorship and training to junior data scientists and analysts, enhancing team capabilities and knowledge
  • Documented model development processes, results, and business impacts comprehensively for stakeholder reporting
  • Advocated for the ethical use of AI and machine learning technologies, ensuring compliance with data privacy regulations and standards
  • Environment: Python, PySpark, C#, Tableau, MongoDB, Hadoop, SQL Server, SDLC, ETL, SSIS, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test

Data Scientist

Juspay Technologies
Bangalore, India
02.2017 - 07.2019
  • Involved in the design, development, and support phases of the Software Development Life Cycle (SDLC). Performed data ETL by collecting, exporting, merging, and massaging data from multiple sources and platforms, including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
  • Programming experience with the .NET framework, C#, and Visual Studio 2005/2008 to build web-based, client/server architecture and to produce reports with C# and JSP.
  • Worked with cross-functional teams (including the data engineering team) to extract data and rapidly execute from MongoDB through the MongoDB connector for Hadoop.
  • Performed data cleaning and feature selection using the MLlib package in PySpark.
  • Performed partitional clustering into 100 clusters using k-means clustering with the Scikit-learn package in Python, where similar hotels for a search are grouped.
  • Used Python to perform an ANOVA test to analyze the differences among hotel clusters.
  • Implemented the application of various machine learning algorithms and statistical modeling, such as Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression, and Linear Regression, using Python to determine the accuracy rate of each model.
  • Determine the most accurate prediction model based on the accuracy rate.
  • Used a text-mining process of reviews to determine customers' concentrations.
  • Delivered analysis support for hotel recommendations and provided an online A/B test.
  • Designed Tableau bar graphs, scatter plots, and geographical maps to create detailed-level summary reports and dashboards.
  • Developed a hybrid model to improve the accuracy rate.
  • Environment: SSIS, Visual Studio, MongoDB Connector for Hadoop, PySpark, Scikit-learn, Python, MLlib, and Tableau.

Data Scientist

High Radius Technologies
Hyderabad, India
11.2015 - 01.2017
  • Participated in all phases of research, including data collection, data cleaning, data mining, developing models, and visualizations.
  • Collaborated with data engineers and the operations team to collect data from the internal system to fit the analytical requirements.
  • Redefined many attributes and relationships, and cleansed unwanted tables/columns using SQL queries.
  • Utilized the Spark SQL API in PySpark to extract and load data, perform SQL queries. and also used the C# connector to perform SQL queries by creating and connecting to the SQL engine.
  • Performed data imputation using the Scikit-learn package in Python.
  • Performed data processing using Python libraries like NumPy and Pandas.
  • Worked with data analysis using the ggplot2 library in R to create data visualizations for a better understanding of customers' behaviors.
  • Implemented statistical modeling with the XGBoost machine learning software package using R to determine the predicted probabilities of each model.
  • Delivered the results with the operations team for better decisions.
  • Environment: SQL, PySpark, Scikit-learn, NumPy, Pandas, ggplot2, and XGBoost.

Education

Bachelor of Technology (B.Tech) - Information Technology

JNTUH
01.2015

Skills

Python

  • R
  • Java
  • Scala
  • SQL
  • MATLAB
  • Big Data
  • AWS SDK
  • TensorFlow
  • PyTorch
  • Nubby
  • Scaly
  • Pandas
  • Matplotlib
  • Tableau
  • Power BI
  • SAP BusinessObjects BI
  • AWS Glue
  • Azure Data Factory
  • Agile
  • Scrum
  • Lean Six Sigma
  • Kanban
  • DevOps
  • Google Data Studio
  • Databricks
  • Apache Superset
  • Oracle
  • Microsoft SQL Server
  • PostgreSQL
  • MongoDB
  • Amazon Redshift
  • Google BigQuery
  • Snowflake
  • Linux
  • UNIX
  • Windows
  • AWS
  • Azure
  • Google Cloud
  • Scikit-learn
  • H2O
  • Fastai
  • XGBoost
  • Deep Learning 4JJ
  • NLTK
  • SpaCy
  • GPT
  • BERT
  • Jenkins
  • GitHub Actions
  • GitLab CI/CD
  • Docker

Kubernetes

Timeline

Senior Data Scientist

Zurich Insurance
10.2023 - Current

Data Scientist

Edward Jones
07.2021 - 09.2023

Data Scientist

Albertsons
10.2019 - 06.2021

Data Scientist

Juspay Technologies
02.2017 - 07.2019

Data Scientist

High Radius Technologies
11.2015 - 01.2017

Bachelor of Technology (B.Tech) - Information Technology

JNTUH
Mani Kanta Reddy Duggempudi