Summary
Overview
Skills
Patent And Published Research Paper
Work History
Education
Accomplishments
co circular activities
VOLUNTARY ACTIVITIES
Timeline
SoftwareEngineer
FNU PREETHI

FNU PREETHI

DATA SCIENTIST

Summary

Experienced Data Scientist with over 5 years of expertise in utilizing data-driven and technology-focused methodologies. Skilled in effectively communicating complex insights to stakeholders and building consensus around well-founded models. Proficient in developing applications and refining models for improved accuracy and efficiency.

Overview

5
5
years of professional experience
6
6
years of post-secondary education

Skills

Programming Languages: C, C, R, Java, Python, scala

undefined

Patent And Published Research Paper

Patent : Apparatus and method for monitoring and recording disintegration times for pharmaceutical products

Link : https://patents.google.com/patent/WO2021067207A1/en

Paper : Elsevier Journal - Disintegration testing augmented by computer Vision technology

Link : https://doi.org/10.1016/j.ijpharm.2022.121668

Work History

DataScientist

Twin Health Inc
Mountain View, CA
02.2021 - 03.2023
  • Developed and improved machine learning models using Scikit-learn, CatBoost, and TensorFlow to predict health biomarkers from IoT sensordata, including blood glucose levels, and help members reverse and prevent chronic metabolic diseases such as diabetes
  • Improved the accuracy of prediction models for example reduced the mean squared error of the "Number of days to diabetic reversal" predictionmodel from 12 days to less than 3 days using PCA , L1 regularization and correlation based feature selection and hyper parameter fine tuning
  • Developed a rule based nutrition, exercise and sleep recommendation systems to improve the determined metabolic state of the patient that arein consistent with the rules defined by medical experts
  • Developed and implemented a machine learning model to predict medication adherence among patients
  • Addressed the issue of imbalanceddataset by leveraging the SMOTE algorithm to generate synthetic samples
  • Ensured model fairness and avoided bias towards the majority classby rigorously evaluating performance using key metrics including precision, recall, F1-score, and AUC-ROC curve
  • Identified the efficiency of the new pilot program to determine the relationship between DNA sequence of an individual and their metabolichealth using cohort analysis
  • Developed custom dashboards and interactive visualizations using Tableau and data visualization techniques suchas matplot, seaborn and plotly
  • Conducted statistical significance tests, such as ANOVA and t-tests, to determine whether there are significant differences in biomarker levelsbetween different patient groups
  • Automated machine learning pipelines using Apache Airflow and CI/CD pipelines to reduce deployment time using sonar cloud and bitbucket
  • Utilized Flask and Django frameworks to implement REST APIs for machine learning models and improved the performance of the APIs by70% using caching and load balancing techniques
  • Utilized AWS Glue and Athena to perform ad-hoc analysis of large-scale health data sets and improved query execution time by 40% usingpartitioning and indexing techniques
  • Worked with DevOps engineers to deploy machine learning models on Kubernetes clusters using Docker and optimized the model deploymentprocess for scalability and reliability
  • Developed a machine learning model to identify clients at risk of churning and implemented targeted interventions to reduce churn rate
  • Achieved a 25% reduction in churn rate using a random forest algorithm and leveraging data from customer interactions, support tickets, usagepatterns, and demographic information
  • The model was validated using holdout testing and achieved a precision of 0.80 and recall of 0.75
  • Developed and implemented a collaborative approach to data management for Twin Health, including data sharing agreements and datagovernance policies that adhere to HIPAA (Health Insurance Portability and Accountability Act) regulations and other relevant health dataprivacy and security standards.

R&D Data Science Co-op

Merck & Co
, New Jersey
02.2020 - 12.2020
  • Designed an efficient input pipeline using Tensorflow dataAPI for image loading in parallel from the pill disintegration unit
  • Developed an efficient SSD Mobilenet, OpenCV data pipeline using transfer learning to calculate the pill disintegration time with an accuracy of97% and exceeded original project expectations by 12% and Containerized(Docker) and deployed re-producible CNN model into production Worked with a team of 6 members and published a patent and productionalized the pill disintegration unit Presented data-driven insights and recommendations to the board of directors on a regular basis, using visualizations and storytelling techniques tocommunicate complex concepts in a clear and compelling manner
  • Patent Link : https://patents.google.com/patent/WO2021067207A1/en Paper Link : https://doi.org/10.1016/j.ijpharm.2022

Business Analyst

Sterling Software
Chennai
02.2018 - 12.2018
  • Developed an Appraisal Management dashboard for a financial institution using BI tools like Tableau and PowerBI Predicted the short and long term performance of mutual fund investments using predictive modeling algorithms like Random forest classifier,
  • Linear regression and cross validation technique in R programming
  • Modeled a BI solution for HR database using Kimball's data warehousing approach
  • Developed MOLAP(cube-MDX) from ROLAP (starschema-SQL) with SQL Server Analysis services to communicate with the business users.

Data Analyst Intern

Finmarts
Chennai
09.2016 - 12.2016
  • Utilized SPSS, Excel and Qualtrics Survey Software to perform advanced analysis methods, including pricing analysis, customer satisfaction/loyalty, brand strength assessment, cluster analysis, and product concept evaluation
  • Utilized DMAIC analysis, created transportation process flow, fishbone charts, analyzed time delay in each stage with simulation and improvedweekly completion rate by 15%.

Information Security Analyst Intern

Doha bank
Doha
02.2016 - 07.2016
  • Actively developed statistical models for fraud detection and customer banking activities using Kmeans clustering in python
  • Assisted senior quantitative analyst in assessing risk management of financial products like funds using machine learning techniques like outlierdetection using DBSCAN
  • Benchmarked ML algorithms like Random Forest and Support Vector Machines(SVM) for classification of risks and thereby increased the repeatbusiness among the investors by 25%.

Education

Master of Science - Information Management, CAS Data Science

Syracuse University
New York, NY
05.2019 - 05.2020

Bachelor of Engineering - Computer Science

Anna University
Chennai , India
05.2014 - 05.2018

Accomplishments

Multi-Modal RNN Prediction for hemodialysis patients. RNN/Tensorflow/Python

◦ Authored a thesis on designing a novel fractal based multi-modal deep learning RNN using TensorFlow to analyze HRV values of patients and predict occurrences of probable emergency events, surpassing a benchmark accuracy of 88.07%.

Airlines Customer Satisfaction Analysis Azure/Python/SQL/AutoML/Spark

◦ Analyzed big data containing 500K airline passenger records using Azure HD Insight to create highly scalable ETL pipelines.

◦ Benchmarked ML algorithms like Decision Trees and Random Forest using Azure AutoML for regression and employed gradient boosting for cross validation.

Pennsylvania Health Insurance Analysis

AssociationRules/ LinearRegression/ DecisionTree/AWS

◦ Analyzed a data set containing 65k records of citizens using Linear regression, Arules & Decision tree algorithms

◦ Monitored metrics using the AWS Cloudwatch by creating dashboards to visualize the results

co circular activities

Research Assistant : Martin J. Whitman School University. 2019 - 2021

Student Supervisor : Sadler Dinning, Syracuse University. 2019 - 2020

Head Event Organizer : Athenaeum Cultural Society, Anna University 2017 - 2018

CTO & Treasurer : : Computer Society of MIT, Anna University 2016 - 2018

VOLUNTARY ACTIVITIES

  • Chennai Animal Shelter Trust Volunteer (2014-2021): Provided care and support for animals through feeding, cleaning, and playtime activities.
  • Peer Mentor at Syracuse University (2019-2021): Guided incoming students on academic and personal issues, aiding in their transition to university life and academic success.
  • Dance Instructor with the Bhumi Foundation in Chennai (2014-present): Taught dance to underprivileged children through weekly online classes, promoting physical activity and self-expression.
  • MIOT Hospitals Volunteer in Chennai (2015-2017): Provided support to patients and administrative tasks at a community health clinic.
  • Second Harvest Food Bank Volunteer in San Jose during COVID crisis (2021-2022): Collected, sorted, and distributed food to individuals and families in need.

Timeline

DataScientist

Twin Health Inc
02.2021 - 03.2023

R&D Data Science Co-op

Merck & Co
02.2020 - 12.2020

Master of Science - Information Management, CAS Data Science

Syracuse University
05.2019 - 05.2020

Business Analyst

Sterling Software
02.2018 - 12.2018

Data Analyst Intern

Finmarts
09.2016 - 12.2016

Information Security Analyst Intern

Doha bank
02.2016 - 07.2016

Bachelor of Engineering - Computer Science

Anna University
05.2014 - 05.2018
FNU PREETHIDATA SCIENTIST