Summary
Overview
Work History
Education
Certification
Projects
References
Timeline
Generic

SHUBHAM PAL

Carteret,NJ

Summary

A Data Science professional with over 5 years of experience designing and implementing data-driven solutions across machine learning, predictive modeling, Generative AI, and Natural Language Processing. Expertise spans both traditional AI techniques and cutting-edge Generative AI, with a strong focus on cloud solutions built in Azure.

  • Deep expertise in machine learning, including deep learning (CNNs, RNNs, Transformers), NLP frameworks (BERT, Spark NLP, Hugging Face), and optimizing Large Language Models (LLMs).
  • Strong background in time series forecasting and predictive modeling to drive business insights and decisions.
  • Proficient in Python, R, PySpark ML, SQL, Scala, and visualization tools like Tableau and Seaborn. Skilled in end-to-end model development and deployment, with a focus on Azure cloud environments.
  • Successfully deployed production-grade models, built robust APIs, and led end-to-end data science projects that deliver real, measurable value to the business.
  • Led cross-functional Data Science teams, driving collaboration and overseeing the successful execution of complex projects from ideation to deployment, ensuring alignment with business goals and technical excellence

Overview

6
6
years of professional experience
1
1
Certification

Work History

Lead Data Scientist

AT&T Services Inc.
Bedminster, NJ
04.2022 - Current
  • Led a data science team to engineer a sophisticated time series forecasting system, utilizing various statistical models to predict disbursement and free cash flow with high accuracy
  • This initiative significantly enhanced Wall Street financial projections and reduced dependency on third-party vendors by 40%
  • Designed and deployed an auto-taxability model integrating unsupervised learning (K-Means) and NLP (BERT), automating asset invoice categorization and enabling $1.2M in annual tax claim savings
  • Pioneered Parameter-Efficient Fine-Tuning (PEFT) of LLMs to embed tax/audit-specific contexts, boosting model relevance by 30% for domain-specific applications, optimized RAG frameworks for multiple Generative AI use cases
  • Integrated GenAI into PowerBI for diverse visualizations driven by queries.
  • Developed dynamic PowerBI, Tableau dashboards, synthesizing complex financial insights for C-suite stakeholders, directly influencing strategic planning and resource allocation

Data Scientist

Research Foundation of Mental Hygiene, Inc.
New York, NY
08.2020 - 05.2021
  • Architected deep learning models (RNN-LSTM) for time series forecasting of rare medical events (e.g., first-episode psychosis), achieving a 20% uplift in predictive accuracy over baseline methods
  • Developed a multi-class XGBoost classifier for medical record categorization, integrated with a BERT/Word2Vec NLP pipeline, reducing manual review time by 50%
  • Conducted statistical modeling on Medicaid data using SAS, uncovering trends that informed clinical resource allocation; visualized findings via Tableau dashboards

Data Scientist

Edge Cloud Technology
Plano, TX
05.2020 - 03.2021
  • Preprocessed and harmonized multi-petabyte datasets from Azure Data Lake, Blob Storage, Hadoop, and Oracle within Databricks, reducing data ingestion latency by 25%
  • Implemented Spark NLP pipelines (BERT, Universal Sentence Encoder, TF-IDF) for advanced text feature extraction, achieving 90% precision in downstream clustering tasks
  • Orchestrated high-concurrency Spark clusters, applying K-Means, Power Iteration Clustering, and Streaming K-Means to segment unstructured data, improving model throughput by 35%
  • Delivered stakeholder-facing Tableau dashboards, distilling actionable insights from complex datasets

Statistics & Analytics Intern

NYS Energy and Research Development Authority
New York, NY
06.2019 - 01.2020
  • Built a machine learning system with PySpark ML to predict customer purchase probabilities based on geolocation and behavioral data, improving conversion rates by 15%
  • Applied K-Means clustering to segment customer cohorts, enabling targeted product personalization and increasing engagement by 22%
  • Managed ETL workflows between Oracle and AWS Redshift using boto3, optimizing SQL scripts for a 30% reduction in query execution time

Education

Master of Science - Data Science

State University of New York
Albany
01.2020

Certification

  • edX Verified Certificate for Large Language Models: Application through Production
  • Databricks : Academy Accreditation - Generative AI Fundamentals
  • Academy Accreditation - Databricks Lakehouse Fundamentals
  • Deeplearning.ai : Fine Tunning large Language Models

Projects

Quora Question Pair Similarity (Machine Learning)

  • · Engineered an NLP-driven model to identify duplicate questions using Bag-of-Words, TF-IDF, and Word2Vec, achieving a 30% log loss via hyperparameter-optimized XGBoost.

Apparel Recommendation System (Text & Image Similarity)

  • · Developed NLP models (TF-IDF, weighted Word2Vec) for text-based product recommendations and a VGG16 CNN for image similarity, improving recommendation relevance by 25%.

DB WorkspaceGPT

  • · A RAG system was developed that retrieves code from Databricks and optimizes it based on user input, including exception handling, summarization, and comments.

References

References available upon request.

Timeline

Lead Data Scientist

AT&T Services Inc.
04.2022 - Current

Data Scientist

Research Foundation of Mental Hygiene, Inc.
08.2020 - 05.2021

Data Scientist

Edge Cloud Technology
05.2020 - 03.2021

Statistics & Analytics Intern

NYS Energy and Research Development Authority
06.2019 - 01.2020

Master of Science - Data Science

State University of New York
SHUBHAM PAL