Summary
Overview
Work History
Education
Skills
Additional Information
Certification
Timeline
Generic

Revathi B Pathuri

Columbus

Summary

Meticulous Data Scientist with 10 years of experience accomplished in compiling, transforming and analyzing complex information through Algorithms. Expert in machine learning and large dataset management. Demonstrated success in identifying relationships and building solutions to business problems.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Scientific Content Engineer (NLP Lead)

CAS
2023.06 - Current
  • Conducted a comparative performance evaluation of two distinct models, one based on Natural Language Processing (NLP) techniques and the other on traditional Machine Learning approaches, to achieve efficient information extraction. Employed rigorous experimentation and analysis to quantify the accuracy, precision, recall, and F1-score of each model. Results informed data-driven decisions and optimizations for enhancing information extraction processes
  • Led a comprehensive project focused on extracting intricate Drug-Target Relationship data from diverse biomedical literature sources. Leveraged advanced Natural Language Processing (NLP) techniques to parse, classify, and extract key information, facilitating the identification of potential drug candidates and their corresponding target proteins. Developed and fine-tuned custom NLP pipelines, including entity recognition and relation extraction, achieving an X% increase in precision and recall compared to baseline methods. The successful implementation of this project contributed valuable insights to drug discovery efforts and exemplified proficiency in NLP-driven information extraction

Senior Data Scientist

Macy's
2023.01 - 2023.05


  • Developed an algorithm to predict the demand of products of various departments based on the historical sales data to improve the profit by stock maintenance.
  • Preparing data using techniques like dimensionality reduction for reduction of features using (PCA, t-SNE) , cleaning the data using libraries of Python.
  • Applying advanced statistical techniques (Bayesian, sampling and experimental design) while performing machine learning algorithms on the heterogenous data.
  • Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy) for data analysis.
  • Constructed and evaluated various types of datasets by performing machine learning models using algorithms and statistical modeling techniques such as clustering, classification, regression, decision trees, support vector machines, anomaly detection, sequential pattern discovery, and text mining from Python libraries (scikits.learn).
  • Performing predictive analytics and machine learning algorithms especially supervised (SVM, Logistic Regression, Boosting), unsupervised (K-Means, LDA, EM) and Reinforcement learning (Random Forests) methods.
  • Using regularization techniques to solve the over-fitting problem by reducing loss function either by adding multiple (LASSO or Ridge) or by performing cross validation.
  • Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
  • Build and deploy machine learning models using GCP's AI Platform and TensorFlow
  • Deployed Spark Ecosystem includes Spark SQL, pySpark DataFrames
  • Experience with migrating other databases to snowflake
  • Experience with Snowflake cloud technology, snowflake cloud data warehouse and AWS S3 bucket for integrating multiple source system which include loading nested JSON data format in to snowflake table
  • write complex snowsql in snowflake cloud data warehouse business analysis and reporting
  • Trained and led junior data scientists with appropriate tool for specific problems.

Data Scientist

Bosch Rexroth
2018.09 - 2021.10


  • Achieved prediction of failures from historical machine data using Anomaly detection, Feature selection
  • Managed implementation of new features by outlining plans and specifications such as how, where and when each component would work.
  • Neural Network techniques such as logistic regression, K- Nearest Neighbor algorithms were implemented as well for enhancing the accuracy rates in prediction failure model
  • Created in house database (OLAP) to process and analyze the data
  • Explored the data to obtain time real-time insights
  • For analysis, hierarchal clustering algorithm was implemented which in this process enabled grouping of similar data points
  • It helped us understand a group of malfunctioning of machine parts or change in the level of certain contents like iron, copper etc., which eventually lead to the failure of the machine
  • Saved 400K $ from the failure of machine and the faulty function of the machine for the potential customers
  • Created efficient data visualization UI using Tableau to showcase the abnormalities and the cause of failure in data
  • The machine learning model is written using python and the database is accessed using SQL.

Junior Data Scientist

Robert Bosch GmbH
2017.10 - 2018.04


  • Built models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features
  • Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically.
  • Worked with Hadoop Ecosystem covering HDFS, HBase, YARN and MapReduce.
  • Built advanced Machine Learning classification models like KNN , SVR and clustering algorithms Hierarchical Clustering.
  • Used K-Means Algorithm Model with different clusters to find meaningful segments on customers and calculated the accuracy of model.
  • Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
  • Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random Forest models, Decision trees, Support Vector Machine for estimating the risks.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources (from Excel, Flat Files, Oracle, SQL Server, Mongo DB, HBase, Teradata and log data from servers) into Hadoop HDFS.
  • Created dashboards in Tableau desktop based on the data collected from MS-excel and CSV files, with MS SQL server databases.

Data Scientist Intern

SAP Germany, DE, Global Retail Services
2017.03 - 2017.07
  • Created Alexa skill set and established utterances and intent schema to interact with SAP HANA database
  • Performed Sentimental analysis in NLP on the email feedback of the customers to determine the tone behind the series of words by Neural Networks techniques like Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN)
  • Used Long-Short Term Memory (LSTM) for analyzing time series data in PyTorch
  • Utilized t-Stochastic Neighborhood Embedding (t-SNE) and Principal Component Analysis (PCA) and to deal with curse of dimensionality
  • Performed data visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.

Research Assistant

DFKI, German Research Center of AI
2014.08 - 2016.06
  • Automated the calculation of demanded dates with the help of machine learning
  • Built advanced Machine Learning classification models like KNN, SVM Regression and clustering algorithms Hierarchical Clustering and DBSCAN
  • Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
  • Worked on Natural Language Processing and extracted data from multilingual hotel invoices
  • Extracted data from social media platforms such as Facebook, twitter and Google using prebuilt social media API
  • Hands on experience in Working with the large volume of data with more than 10M customer records with 20+ features

Assistant Systems Engineer - Hadoop Developer

Tata Consultancy Services
2012.01 - 2013.12


  • Proficient in Cloudera distribution for Hadoop, including Cloudera Manager and Cloudera Navigator.
  • Familiarity with Big Data tools such as Hadoop, MapReduce, Hive, Pig, Kafka, and Python.
  • Implemented a scalable and fault-tolerant data processing pipeline using Cloudera and Apache Spark, resulting in a 30% reduction in processing time.
  • Utilized Spark's machine learning libraries (MLlib) to build predictive models for customer segmentation, resulting in a 20% improvement in targeted marketing campaigns.
  • Developed a custom data ingestion framework using Spark and Scala, reducing data loading time by 40% and enhancing data quality.
  • Presented technical workshops and training sessions on Cloudera, HBase, and Spark for internal teams, enhancing their understanding and adoption of Big Data technologies.
  • Performed data visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
  • Developed Oozie Workflows for daily incremental loads to get data from social media and imported them into Hive tables
  • Resolved classification algorithms such as Linear Regression, Logistic Regression, and K-NN to predict the customer churn and customer interface
  • Built models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features
  • Used Jupyter notebook for writing Python scripts for training/testing data sets.

Education

Master of Science - Artificial intelligence

Technical University of Kaiserslautern
Germany
04.2018

Bachelor of Science - Computer Science

Kalasalingam University
India
04.2011

Skills

  • Data Visualization
  • Predictive Analysis
  • Statistical Modelling
  • Data Analytics
  • Clustering and Mining
  • Data Mining
  • Logistic Regression
  • Linear Regression
  • Support Vector Machine
  • Decision Trees
  • K-Nearest Neighbor
  • Naïve Bayes
  • K- Means Clustering
  • Hierarchal Clustering
  • Density Based Clustering
  • Principal Component Analysis
  • Natural Language processing
  • Artificial Neural Networks
  • Recurrent Neural Networks
  • Model development
  • Quantitative Analysis
  • Deep Learning
  • Machine Learning
  • Demand Forecasting
  • Business Intelligence
  • Data Modelling
  • Market Trend Analysis
  • Jupyter Notebook
  • Gradient Boosting
  • Time Series Models

Additional Information

  • Awarded Math Excellency 2008
  • Best outgoing student of the year 2009-2010

Certification

IBM Applied AI Professional Certificate

Coursera certified Hadoop Developer

IBM Rational Functional Tester

Timeline

Scientific Content Engineer (NLP Lead)

CAS
2023.06 - Current

Senior Data Scientist

Macy's
2023.01 - 2023.05

Data Scientist

Bosch Rexroth
2018.09 - 2021.10

Junior Data Scientist

Robert Bosch GmbH
2017.10 - 2018.04

Data Scientist Intern

SAP Germany, DE, Global Retail Services
2017.03 - 2017.07

Research Assistant

DFKI, German Research Center of AI
2014.08 - 2016.06

Assistant Systems Engineer - Hadoop Developer

Tata Consultancy Services
2012.01 - 2013.12

Master of Science - Artificial intelligence

Technical University of Kaiserslautern

Bachelor of Science - Computer Science

Kalasalingam University

IBM Applied AI Professional Certificate

Coursera certified Hadoop Developer

IBM Rational Functional Tester

Revathi B Pathuri