Summary

Overview

Work History

Education

Skills

Additional Information

Certification

Timeline

Revathi B Pathuri

Columbus

Summary

Meticulous Data Scientist with 10 years of experience accomplished in compiling, transforming and analyzing complex information through Algorithms. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems.

Overview

years of professional experience

Certification

Work History

Scientific Content Engineer (NLP Lead)

CAS

06.2023 - Current

Conducted a comparative performance evaluation of two distinct models, one based on Natural Language Processing (NLP) techniques and the other on traditional Machine Learning approaches, to achieve efficient information extraction. Employed rigorous experimentation and analysis to quantify the accuracy, precision, recall, and F1-score of each model. Results informed data-driven decisions and optimizations for enhancing information extraction processes
Led a comprehensive project focused on extracting intricate Drug-Target Relationship data from diverse biomedical literature sources. Leveraged advanced Natural Language Processing (NLP) techniques to parse, classify, and extract key information, facilitating the identification of potential drug candidates and their corresponding target proteins. Developed and fine-tuned custom NLP pipelines, including entity recognition and relation extraction, achieving an X% increase in precision and recall compared to baseline methods. The successful implementation of this project contributed valuable insights to drug discovery efforts and exemplified proficiency in NLP-driven information extraction

Senior Data Scientist

Macy's

01.2023 - 05.2023

Developed an algorithm to predict the demand of products of various departments based on the historical sales data to improve the profit by stock maintenance.
Preparing data using techniques like dimensionality reduction for reduction of features using (PCA, t-SNE) , cleaning the data using libraries of Python.
Applying advanced statistical techniques (Bayesian, sampling and experimental design) while performing machine learning algorithms on the heterogenous data.
Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy) for data analysis.
Constructed and evaluated various types of datasets by performing machine learning models using algorithms and statistical modeling techniques such as clustering, classification, regression, decision trees, support vector machines, anomaly detection, sequential pattern discovery, and text mining from Python libraries (scikits.learn).
Performing predictive analytics and machine learning algorithms especially supervised (SVM, Logistic Regression, Boosting), unsupervised (K-Means, LDA, EM) and Reinforcement learning (Random Forests) methods.
Using regularization techniques to solve the over-fitting problem by reducing loss function either by adding multiple (LASSO or Ridge) or by performing cross validation.
Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
Build and deploy machine learning models using GCP's AI Platform and TensorFlow
Deployed Spark Ecosystem includes Spark SQL, pySpark DataFrames
Experience with migrating other databases to snowflake
Experience with Snowflake cloud technology, snowflake cloud data warehouse and AWS S3 bucket for integrating multiple source system which include loading nested JSON data format in to snowflake table
write complex snowsql in snowflake cloud data warehouse business analysis and reporting
Trained and led junior data scientists with appropriate tool for specific problems.

Data Scientist

Bosch Rexroth

09.2018 - 10.2021

Achieved prediction of failures from historical machine data using Anomaly detection, Feature selection
Managed implementation of new features by outlining plans and specifications such as how, where and when each component would work.
Neural Network techniques such as logistic regression, K- Nearest Neighbor algorithms were implemented as well for enhancing the accuracy rates in prediction failure model
Created in house database (OLAP) to process and analyze the data
Explored the data to obtain time real-time insights
For analysis, hierarchal clustering algorithm was implemented which in this process enabled grouping of similar data points
It helped us understand a group of malfunctioning of machine parts or change in the level of certain contents like iron, copper etc., which eventually lead to the failure of the machine
Saved 400K $ from the failure of machine and the faulty function of the machine for the potential customers
Created efficient data visualization UI using Tableau to showcase the abnormalities and the cause of failure in data
The machine learning model is written using python and the database is accessed using SQL.

Junior Data Scientist

Robert Bosch GmbH

10.2017 - 04.2018

Built models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features
Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically.
Worked with Hadoop Ecosystem covering HDFS, HBase, YARN and MapReduce.
Built advanced Machine Learning classification models like KNN , SVR and clustering algorithms Hierarchical Clustering.
Used K-Means Algorithm Model with different clusters to find meaningful segments on customers and calculated the accuracy of model.
Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest models, Decision trees, Support Vector Machine for estimating the risks.
Involved in creating Data Lake by extracting customer's Big Data from various data sources (from Excel, Flat Files, Oracle, SQL Server, Mongo DB, HBase, Teradata and log data from servers) into Hadoop HDFS.
Created dashboards in Tableau desktop based on the data collected from MS-excel and CSV files, with MS SQL server databases.

Data Scientist Intern

SAP Germany, DE, Global Retail Services

03.2017 - 07.2017

Created Alexa skill set and established utterances and intent schema to interact with SAP HANA database
Performed Sentimental analysis in NLP on the email feedback of the customers to determine the tone behind the series of words by Neural Networks techniques like Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN)
Used Long-Short Term Memory (LSTM) for analyzing time series data in PyTorch
Utilized t-Stochastic Neighborhood Embedding (t-SNE) and Principal Component Analysis (PCA) and to deal with curse of dimensionality
Performed data visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.

Research Assistant

DFKI, German Research Center of AI

08.2014 - 06.2016

Automated the calculation of demanded dates with the help of machine learning
Built advanced Machine Learning classification models like KNN, SVM Regression and clustering algorithms Hierarchical Clustering and DBSCAN
Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
Worked on Natural Language Processing and extracted data from multilingual hotel invoices
Extracted data from social media platforms such as Facebook, twitter and Google using prebuilt social media API
Hands on experience in Working with the large volume of data with more than 10M customer records with 20+ features

Assistant Systems Engineer - Hadoop Developer

Tata Consultancy Services

01.2012 - 12.2013

Proficient in Cloudera distribution for Hadoop, including Cloudera Manager and Cloudera Navigator.
Familiarity with Big Data tools such as Hadoop, MapReduce, Hive, Pig, Kafka, and Python.
Implemented a scalable and fault-tolerant data processing pipeline using Cloudera and Apache Spark, resulting in a 30% reduction in processing time.
Utilized Spark's machine learning libraries (MLlib) to build predictive models for customer segmentation, resulting in a 20% improvement in targeted marketing campaigns.
Developed a custom data ingestion framework using Spark and Scala, reducing data loading time by 40% and enhancing data quality.
Presented technical workshops and training sessions on Cloudera, HBase, and Spark for internal teams, enhancing their understanding and adoption of Big Data technologies.
Performed data visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
Developed Oozie Workflows for daily incremental loads to get data from social media and imported them into Hive tables
Resolved classification algorithms such as Linear Regression, Logistic Regression, and K-NN to predict the customer churn and customer interface
Built models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features
Used Jupyter notebook for writing Python scripts for training/testing data sets.

Education

Master of Science - Artificial intelligence

Technical University of Kaiserslautern

Germany

04.2018

Bachelor of Science - Computer Science

Kalasalingam University

India

04.2011

Skills

Data Visualization
Predictive Analysis
Statistical Modelling
Data Analytics
Clustering and Mining
Data Mining
Logistic Regression
Linear Regression
Support Vector Machine
Decision Trees
K-Nearest Neighbor
Naïve Bayes
K- Means Clustering
Hierarchal Clustering
Density Based Clustering

Principal Component Analysis
Natural Language processing
Artificial Neural Networks
Recurrent Neural Networks
Model development
Quantitative Analysis
Deep Learning
Machine Learning
Demand Forecasting
Business Intelligence
Data Modelling
Market Trend Analysis
Jupyter Notebook
Gradient Boosting
Time Series Models

Additional Information

Awarded Math Excellency 2008
Best outgoing student of the year 2009-2010

Certification

IBM Applied AI Professional Certificate

Coursera certified Hadoop Developer

IBM Rational Functional Tester

Timeline

Scientific Content Engineer (NLP Lead)

CAS

06.2023 - Current

Senior Data Scientist

Macy's

01.2023 - 05.2023

Data Scientist

Bosch Rexroth

09.2018 - 10.2021

Junior Data Scientist

Robert Bosch GmbH

10.2017 - 04.2018

Data Scientist Intern

SAP Germany, DE, Global Retail Services

03.2017 - 07.2017

Research Assistant

DFKI, German Research Center of AI

08.2014 - 06.2016

Assistant Systems Engineer - Hadoop Developer

Tata Consultancy Services

01.2012 - 12.2013

Master of Science - Artificial intelligence

Technical University of Kaiserslautern

Bachelor of Science - Computer Science

Kalasalingam University

Revathi B Pathuri

Summary

Overview

Work History

Scientific Content Engineer (NLP Lead)

Senior Data Scientist

Data Scientist

Junior Data Scientist

Data Scientist Intern

Research Assistant

Assistant Systems Engineer - Hadoop Developer

Education

Master of Science - Artificial intelligence

Bachelor of Science - Computer Science

Skills

Additional Information

Certification

Timeline

Scientific Content Engineer (NLP Lead)

Senior Data Scientist

Data Scientist

Junior Data Scientist

Data Scientist Intern

Research Assistant

Assistant Systems Engineer - Hadoop Developer

Master of Science - Artificial intelligence

Bachelor of Science - Computer Science

Similar Profiles

Bhupendra SubediBhupendra Subedi

Md Abdur RahimMd Abdur Rahim

VINCENT ONYEMAVINCENT ONYEMA

ISAAC BURNETTISAAC BURNETT

Addison FraleyAddison Fraley