Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sharfuddin Alam

Green Card

Summary

Experienced Data Science professional who interprets and extract intelligence from data and solves complex business problems with Data Visualization, Data Modeling, Statistical and Machine Learning. Proficient in furnishing executive leadership team with insights, analytics , reports and recommendations enabling effective strategic planning across all business units, distribution channels and product lines.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Data Scientist IV

Tabner Inc. (Charter Communications)
09.2022 - Current

Data Quality Team has tiers of audit reports to alert any Data Integrity Issues. The scope of this project is to define operational strategy using Data Science tools and methodologies to identify outliers or anomalies to build smarter audits. The purpose of the Anomaly Detection team was to optimize those reports leveraging data science tools and methodologies.

  • Utilized advanced querying, visualization and analytics tools to analyze and process complex data sets.
  • Identified, measured and recommended improvement strategies for KPIs across business areas.
  • Developed intricate algorithms based on deep-dive statistical analysis and predictive data modeling.
  • Exploratory Data Analysis (Uni-variate and Multi- Variate Analysis ) was performed using inferential and descriptive statistics to understand distribution and trends in data
  • Optimized Marketing KPI (Netgain) by 70% from the existing audit report
  • Applied custom penalty terms to assess the various model performance to find the optimal parameter for both statistical and machine learning models
  • · ANOVA to understand how the central tendencies between PSU of different business units varies
  • · Feature Engineering to create bins and feature weights were also tuned
  • Developed machine learning models(XGBoost) to detect outliers for Sales, FieldOps and CustOps verticals
  • Performed Cluster Analysis (K-prototype) to generate insights to find the categorical and continuous data
  • · XGBoost with oversampling(using SMOTE) and under sampling were performed to handle data imbalance
  • · Built a Random Forrest Regressor with tuning parameters to estimate trouble call volumes each day at Enterprise Level
  • · Used SHAP scores to interpret the model parameters and their top contribution
  • · Spearman Correlation Coefficient were also generated to understand how each day of the week’s correlation varies over time to the call volume


Tools Used : Pandas, Scikit – learn , Seaborne, Tableau, Terradata SQL , Numpy, Jira , MSTR , Docker


Data Science/Artificial Intelligence Mentor

Thinkful/Chegg
05.2022 - Current

Thinkful program offers bootcamps to adult learners across the nation in Data Science. Mentors get the opportunity to work 1:1 with learners to guide them and gauge their understanding on the field of Data Science.


  • In 1:1 mentorship, got the opportunity to identify the knowledge gap and educate learners on respective topics
  • Discuss, Devise and Develop planning to improve the topics understanding for following sessions to come

· Rectify any topics/doubts learners might have following the week’s content and go in depth to explain the concepts

Data Science Mentor

Great Learning; Post Graduate Program In AI And DS
09.2021 - Current

The mentored learning sessions are two-hour sessions held every weekend to complement the learning material released to the learners at the beginning of each week. These sessions are run by industry professionals who are Data Scientists themselves with experience and expertise in the field. The purpose of these sessions is to give a business perspective to the theoretical material that the learners have gone over, in order to also make them ready for business world challenges.


  • Gauge the level of understanding of learners and identify things to focus on.
  • Clarifications of doubts on concepts/learnings from the course of the week prioritized based on what topics majority of the group wants to cover.
  • Case Study: Work hands-on in Python and show how to ‘apply’ the technique
  • Extended Doubt Clearing – Additional questions to be addressed

Data Scientist

Collabera (John Deere)
03.2022 - 06.2022

As a part of System Machine Health Analytics Initiative, this team monitors the health of various tiers of engines and other machineries. The team is currently undergoing BRU(Bulk rule updates) to analyze the telematics data and/or Warranty data from John Deere engines and predict the fault in parts of these Engines for every Customer/Dealer.


  • Created expert alerts that triggered 1000s of machines and delivered the analysis to the dealers and customers.
  • Created Spark Streaming to report issues with machines within a few seconds of reading the data from the data warehouse as opposed to waiting for few hours

Data Scientist II

Daybreak IT Solutions (U.S. Department Of VA)
02.2021 - 10.2021

Working alongside with the Clinical Researcher to analyze the statistical significance of various protocols assigned to Veterans to improve their health conditions such as Hypertension, Blood Pressure, Overweight and Covid-19 effects.

  • Performing non parametric Wilcoxon Signed Rank Test to obtain W-value, p-value and the comparison in veterans’ granular level.
  • Pulled data from various dissipate data sources to build Integration Layer to filter out data points that doesn’t align with business efficacy rule.
  • Wrote Spark RDD frame and optimize the whole queries with cube approach to make the pipeline optimal and scalable.
  • Custom UDF function to calculate Statistical values such as CI High, CI Low, p-values, color coding statistical significance columns and test statistics.
  • Build custom datalike along with Data Ingestion Engineers for further data preprocessing and building up the presentation layers for the final Dashboard.
  • Synthesize Veterans’ data with Python Faker library to mask patient confidential information such as SSN, Demographics and other information while conducting cross functional analysis.
  • Performing Cohort level analysis to find the significance of various hypothesis that was designed based on clinical researchers’ rulings.


Tools and Environment: Azure Databricks, Azure Synapse, Spark ML, PowerBI, RDD , MS Sql

Data Scientist

Strategic Staffing Solutions(Duke Energy)
11.2019 - 01.2021
  • Built a classifier for Work Order Prioritization by generating word embeddings from work order descriptions and feature engineered to identify and prioritize high priority tasks
  • Implemented sentence similarity score between long descriptions and short descriptions of work orders as a part of feature engineering using NLP techniques
  • Implemented Object Detection (Retinanet) models by leveraging knowledge in Deep Neural Networks for detecting different kinds of faults on Solar Panel Images
  • · Implemented YoloV5 as custom object detection model to identify various failure types for Total Plant Inspections
  • Developed document summarizer and classifier using NLP and Topic analysis
  • Trained Time Series models using LSTM, RNN and conventional time series algorithms like ARIMA for forecasting
  • Developed models for real time health monitoring of plant equipment based on Plant Information data (sensor and thermocouple)
  • Leveraged Amazon Lex service to build a conversational chatbot to provide assistance to customers enquiring billing, usage and outage events


Data Analyst

City Of Ottawa
09.2011 - 04.2012

To address the management's increasing concern of continuous bus failures obtained data for different bus categories of buses from multiple tables that were distributed in three different warehouses: Teradata, Microsoft SQL Server, and Oracle Database. Ingested these data in one single platform in order to perform exploratory analysis.

  • Conceptualized MDBF as statistical metrics as the key indicator of failures in order to test various Hypothesis.
  • Built a Machine Learning Classifier using Bus Maintenance Data to predict future bus failures with Random Forests with 75% accuracy
  • Implemented bag-of-words model to perform semantic analysis on sector and news feed dataset
  • Formulate context relevant questions and hypotheses to foster data-driven scientific research and decision making
  • Evaluated various tracking matrices in data related projects and improved overall accuracy of models from 69 to 84%
  • Ad Hoc queries, analyses, and segmentation studies that combine multiple tools and data sources and types to extract insights from various A/B and multivariate tests
  • Environment: Natural Language Processing, Word2vec, Bag-of-words, Gradient Boosting, Classification, A/B Testing

Education

Master of Science - Computer Science

Lamar University
Beaumont, TX
2018

Bachelor of Science - Electrical Engineering

Carleton University
Ottawa,ON
2012

Skills

  • Time Series Analysis
  • Classification and Regression Machine Learning Algorithms (Random Forest , Gradient Boosting Algorithm , XGBoost , AdaBoost )
  • Clustering Algorithms (K-mean, K - NN , K- prototype,
  • Agglomerative and Divisive, DBSCAN)

  • Dimensionality Reduction (PCA,LDA,t-SNE)
  • Statistical Analysis
  • Deep Learning ( Multilayer NNs,CNNs,RNNs, LSTM)
  • Python (Numpy,Pandas,Scikit-learn,Matplotlib , Seaborn,Keras, Tensorflow)
  • SQL (MySQL,PostgreSQL)
  • Git,Github,Bitbucket
  • Regularization and Hyperparameter Tuning
  • Cloud (Aws Sagemaker, S3, EC2, Amazon Lex, Amazon Poly)
  • DevOps (Docker, Kubernetes,Flask)
  • Anomaly Detection
  • Exploratory Data Analysis

Certification

Data Science Career Track Certification

Timeline

Data Scientist IV

Tabner Inc. (Charter Communications)
09.2022 - Current

Data Science/Artificial Intelligence Mentor

Thinkful/Chegg
05.2022 - Current

Data Scientist

Collabera (John Deere)
03.2022 - 06.2022

Data Science Mentor

Great Learning; Post Graduate Program In AI And DS
09.2021 - Current

Data Scientist II

Daybreak IT Solutions (U.S. Department Of VA)
02.2021 - 10.2021

Data Scientist

Strategic Staffing Solutions(Duke Energy)
11.2019 - 01.2021

Data Analyst

City Of Ottawa
09.2011 - 04.2012

Master of Science - Computer Science

Lamar University

Bachelor of Science - Electrical Engineering

Carleton University
Sharfuddin Alam