Summary
Overview
Work History
Education
Skills
Certification
Additional Information
Timeline
Hi, I’m

Shreyashi Mukhopadhyay

Data Scientist
Atlanta,Georgia
Shreyashi Mukhopadhyay

Summary

A highly motivated and independent data-driven professional with over 7 years of diverse experience in domains such as Consumer Analytics, Financial Analytics, Insurance & Risk Analytics, Social Media Analytics, and Supply Chain Analytics. Expertise encompassing Data Extraction, Data Modelling, Data Wrangling, Machine Learning, Deep Learning, Predictive Modeling, Statistical Hypothesis Testing, Data Visualization, Natural Language Processing with Python libraries like Pandas, Numpy, Matplotlib, Seaborn, Scikitlearn, TensorFlow, NetworkX, Streamlit to extract insights from data. Specialized in developing and deploying machine learning models including Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, XGBoost, ADA Boost, CAT Boost, Imbalanced Data modelling using Undersampling, Oversampling, SMOTE, Time series modelling with ARIMA, SARIMA, Facebook Prophet. Comprehensive knowledge and hands-on experience with databases such as MySQL, SQL Server, PostgreSQL, ensuring seamless data retrieval using complex SQL queries for extracting, manipulating, and analyzing large datasets. Skilled at the creation of intuitive and interactive dashboards and reports using visualization tools such as Tableau, Power BI providing stakeholders with a coherent understanding of data insights and findings. Experienced in building predictive models and conducting NLP tasks like sentiment analysis, Topic Modelling, Text classification, Next word prediction, Text summarization using LSTMs, Bi-LSTMS, GRUs, and LLMs like Transformers, BERT and BART, GPT-3. Proven ability to collaborate e ectively with cross-functional teams and present complex findings to both technical and non- technical stakeholders, ensuring clarity and understanding. A proactive and adaptable team player seeking challenging opportunities to apply expertise in a dynamic and innovative environment.

Overview

22
years of professional experience
1
Certification

Work History

D Ecosystems
Atlanta, GA

Data Analyst Intern R
05.2024 - 07.2024

Job overview

  • Developed and implemented a comprehensive Power BI dashboard suite for the R&D Ecosystems & Quality departments, enabling them to quickly access key performance metrics and make data-driven decisions resulting in an 80-100 % increase in the e ciency of the reports
  • Sourced data from a variety of sources, including Novelis's internal databases (ERP, CRM, production databases), cloud-based platforms like
  • Salesforce, Databricks, AWS/Azure, Accolade, SharePoint and third-party data providers
  • Used Power Query to clean, transform, and prepare data for analysis and visualization, ensuring data accuracy and e ciency including removing duplicates, handling missing values, merging tables, creating calculated columns
  • Developed data models using DAX and Power Query to facilitate interactive and dynamic dashboards using various chart types and custom visuals to e ectively communicate complex data trends and patterns enabling stakeholders to filter data, explore di erent perspectives, and gain insights into key trends and patterns through interactive dashboards
  • Utilized a creative approach to data visualization, developing interactive visualizations that e ectively communicated complex KPIs and trends using various chart types, including bar charts, line graphs, scatter plots, maps, and custom visuals
  • Collaborated closely with stakeholders across departments to understand their reporting needs and translate those needs into clear and compelling visualizations, facilitating data-driven decision-making and alignment across teams.

Social Media Analytics, Bloomberg & Tripavisor, J Mack Robinson College of Business, Georgia State University
Atlanta, GA

Graduate Research Assistant
06.2023 - 11.2023

Job overview

  • Python, Bloomberg, NLTK, Tensorflow, Keras , Topic Modelling with LDA, RNN modelling with LSTM, Bi-LSTM, GRU, CNN LSTM, CNN Bi-LSTM, BERT
  • Downloaded the financial Earnings calls audio and text transcripts for listed companies from Bloomberg
  • Applied NER - Name Entity recognition and POS tagging to filter the text transcripts of the CEO statements
  • Applied Cosine similarity to extract the sentences which are most similar to the statements made by the CEO and performed aggregation
  • Cleaned the data and visualized the term frequency using unigrams, bigrams and trigrams and performed BERT topic modelling
  • Web scraped Tripadvisor.com to extract the user text reviews and associated image data for a New York based Hotel
  • Cleaned and labelled the data based on the Helpfulness of the review submitted by the end users
  • Implemented Deep learning models LSTM, Bi-LSTM, GRU & BERT to classify the labelled data as Helpful versus Non-Helpful
  • Performed model evaluation and comparison to assess the best model for classification of the target class and performed object detection on the Images for the corresponding reviews using OpenCV and CNN.

J Mack Robinson College of Business, Georgia State University
Atlanta, GA

Graduate Research Assistant
01.2023 - 05.2023

Job overview

  • Python, Machine Learning, Deep Learning, Hypothesis Testing, Risk Management, Insurance
  • Researched and modelled various business problems for TELENAV using IOT telematics data from connected car devices for the launch of their pay-as-you-use auto insurance product NOVO for California and Wisconsin
  • Developed a predictive model that predicts the risk for every route in CA based on the risk score derived from driver behaviors exhibited along that route using the vehicle's Telematics data
  • Developed a predictive model that predicts the premium based on the probability of an accident and expected loss for every route in CA based on accident data from 2011-21
  • Developed a predictive model to determine significant accurate pricing of auto insurance based on a number of driver and vehicle attributes
  • Developed a predictive model that combines driver personal behavioral attributes and route risk scoring attributes to compute the riskiness of a driver for the state of Wisconsin using Telematics IoT data
  • Applied Undersampling, Oversampling and SMOTE sampling techniques for data modelling using Logistic Regression, Decision Tree, Random
  • Forest and XGBoost classification algorithms to assess the risk associated with Hard Brake, Hard acceleration, Sharp turn, Driver Distraction and Speeding events frequency of a driver.

J Mack Robinson College of Business, Georgia State University
Atlanta, GA

Graduate Research Assistant
01.2022 - 10.2022

Job overview

  • Python programming for Finance, Web scraping, Sentiment Analysis using NLP techniques, Neural Networks
  • Deep Learning
  • Natural Language Processing (NLP)
  • BERT
  • Long
  • Short-term Memory (LSTM)
  • Sentiment Analysis
  • Studied and established the Correlations between Stock transaction volume & daily returns in conjunction with Sentiment analysis on subreddit r/Wallstreetbets at Reddit.com on
  • Researched and Analyzed stocks with a high comment volume on r/WallStreetBets at Reddit.com
  • Quantified the relationship between Stock comment volume and Stock transaction volume via correlation analysis
  • Quantified the relationship between Stock comment volume and Stock daily returns via correlation analysis
  • Quantified the relationship between Stock comment sentiment and Stock trading signal by analyzing the user comments between each trading signal
  • Analyzed various Stock Technical and Volatility Indicators like Simple & Exponential Moving Averages, Bollinger Bands, Keltner Channels, ATR
  • Indicator, RSI Indicator, MACD Indicator, Stochastic Oscillator Indicator to create a Stock trading strategy using Algorithmic trading
  • Created a Algorithmic trading strategy using simple moving averages and extracted comments between the trading signals for application of
  • NLP techniques
  • Used vectorization techniques like Count vectorizer, GloVe,TF-IDF, Word-to-Vec in conjunction with classification models such as SVM
  • Decision Tree, Random Forest, Adaboost, LSTM, BiLSTM & BERT models for ensemble sentiment classification of positive, negative and neutral market sentiments.

Epsilon
Irving, Texas

Data Analyst
10.2017 - 06.2019

Job overview

  • Remote
  • Consumer Analytics, Retail Analytics, Digital Marketing, Python, SQL, NLP, Epsilon PeopleCloud
  • Performed data extraction, transformation, and loading (ETL) operations for the credit and retail consumer data stored in-house databases as well as in client marketing databases using SQL and ensured data integrity and consistency while managing complex datasets from multiple sources
  • Leveraged Python's various analytical libraries like Pandas, NumPy, and ScikitLearn to create clear and informative data visualizations and in- depth statistical analysis, hypothesis testing, and correlation studies, uncovering concealed patterns and actionable insights
  • Developed and applied various regression models including Linear regression, Decision tree and Random Forest models and minimized error metrics by over 20% to provide best predicted values
  • Successfully built a suite of classification models including Logistic Regression, Decision tree, Random Forest and increased the accuracy by over 25% for binary classification application
  • Utilized SQL to extract and analyze data from various sources, providing actionable insights to enhance operational e ciency and support informed business decisions
  • Analyzed unstructured textual data and performed customer behavior and sentiment analysis using the Epsilon PeopleCloud products namely
  • Discovery, Customer and Prospect to uncover customer sentiment and behavior patterns from the consumer data and generated reports
  • Collaborated with teams across the organization, including marketing, finance, and operations, to provide data-driven solutions and recommendations.

Business Analyst

Job overview

  • Www.e2open.com

Juniper Networks
Hyderabad

Team member of Steelwedge
06.2007 - 07.2009

Job overview

  • MS Excel, SQL Server, Steelwedge EPPM Product Software
  • EPPM sales & operations demand planning and forecasting solutions worked with various clients like HP, EDS, , Ditech Networks, NVIDIA, Spansion, Tellabs
  • Collaborated with stakeholders, including business users, managers, and IT teams, to understand and document business requirements, document business rules and identify unique implementation details and performed Stakeholder analysis
  • Created detailed documentation, including functional specifications BRD's, user stories, process flows, and use cases, to e ectively communicate requirements and solutions to development teams and other stakeholders
  • Created Test plan and developed test cases for data validation and verification to ensure accuracy and reliability of the integrity of front-end data on the Steelwedge Enterprise software with the backend data using SQL Server
  • Participated in user acceptance testing (UAT) to validate system functionality and ensure that implemented solutions meet specified requirements
  • Identified and reported defects or issues for resolution using JIRA
  • Acted as a liaison between business users and technical teams, facilitating e ective communication and understanding between both parties
  • Manage stakeholder expectations and ensure alignment between business needs and technical solutions.

International Institute of Population Sciences, Ministry of Health
Mumbai

Research Assistant
01.2004 - 04.2007

Job overview

  • MS Excel, SPSS, STATA
  • A WHO &, and Family Welfare Project
  • 2
  • NFHS -3 Project (National Family and Health survey)
  • Assisted in the collection of data for research projects, including conducting surveys, interviews and gathering data from existing sources such as databases or public records
  • Utilized statistical software including Excel, SPSS & STATA to perform data analysis and generate statistical reports, charts, and visualizations and made presentations of findings to the stakeholders
  • Collaborated with researchers, principal investigators, or other team members to discuss research objectives, methodologies, and findings
  • E ectively communicated statistical concepts, methods, and results to non-technical audiences
  • Documented research processes and implemented required procedures to align with standardized reporting formats
  • Conducted quality checks on research data and statistical analyses to ensure accuracy, consistency, and adherence to established standards
  • Verified results and assisted in resolving any data-related issues or anomalies.

AC Nielsen India

Operations Co-ordinator
01.2003 - 01.2004

Job overview

  • Excel, RAPS - Nielsen Product Software

FIELD MANAGEMENT

Team member

Job overview

  • For Consumer Panel Services with the Retail Measurement services SBU for tracking consumer behavior for CPG products
  • Contributed in the syndicated monthly data analysis of all FMCG Clients Consumer Panel data for Retail measurement services group
  • Worked on trend analysis of the data for the current month vs
  • Previous 5 months data and product sales scrutiny and error correction in case of any unusual or abnormal sales figures in the timeseries data and report generation
  • Ensured the accuracy, completeness, and reliability of data by conducting quality checks, data validation, and verification procedures with
  • Team manager
  • Identified and resolved any data-related issues or anomalies
  • Collaborated with clients to understand their business objectives and data requirements and provided post-delivery support via customized data-driven insights and recommendations to support client decision-making processes from time to time
  • SUPERMARKET Project: Consolidating the Supermarket database every month and performing a syndicated data analysis of the data and generating of the monthly reports for all subscribing Nielsen Clients
  • INFORMATION SYSTEM (FMIS): Generation and Analysis of FMIS Report on a monthly basis, which gives an account of the performance of the entire field sta across all audits in India.

Education

Mack Robinson College of Business, Georgia State University

MS from Data Science & Analytics
08.2024

University Overview

GPA: 3.68

University School of Sciences, Gujarat University

M. Sc from Statistics
07.2002

University Overview

GPA: 3.0

Skills

  • Microsoft O ce Python Pandas Numpy Matplotlib Seaborn NLP NLTK, Spacy, Gensim Pyspark
  • SQL Server 2022 MySQL PostgreSQL PowerBI Tableau Open CV
  • Network X R CSS Github HTML Bitbucket Pycharm VSCode LLMs

Certification

Georgia Tech Data Science and Analytics Bootcamp (08/2019 - 02/2020)

Additional Information

Additional Information
  • Achievements/Tasks Achievements/Tasks Achievements/Tasks
  • Achievements/Tasks Achievements/Tasks Achievements/Tasks
  • Achievements/Tasks Achievements/Tasks

Timeline

Data Analyst Intern R
D Ecosystems
05.2024 - 07.2024
Graduate Research Assistant
Social Media Analytics, Bloomberg & Tripavisor, J Mack Robinson College of Business, Georgia State University
06.2023 - 11.2023
Graduate Research Assistant
J Mack Robinson College of Business, Georgia State University
01.2023 - 05.2023
Graduate Research Assistant
J Mack Robinson College of Business, Georgia State University
01.2022 - 10.2022
Data Analyst
Epsilon
10.2017 - 06.2019
Team member of Steelwedge
Juniper Networks
06.2007 - 07.2009
Research Assistant
International Institute of Population Sciences, Ministry of Health
01.2004 - 04.2007
Operations Co-ordinator
AC Nielsen India
01.2003 - 01.2004
Business Analyst
Team member
FIELD MANAGEMENT
Mack Robinson College of Business, Georgia State University
MS from Data Science & Analytics
University School of Sciences, Gujarat University
M. Sc from Statistics
Shreyashi MukhopadhyayData Scientist