Summary
Overview
Work History
Education
Skills
Websites, Portfolios, Profiles
Data Science Personal Project Portfolio
Modeling and Analytical Skills
Publications
Certification
Interests
Timeline
Generic

Ali Murad

Data Scientist II

Summary

Experienced Data Scientist passionate about healthcare and technology. I build data centric solutions by using technological tools and mathematical models to extract information from data used to improve patient care. As a data professional, I translate complex problems into frameworks which allow such problems to be solved through machine learning and artificial intelligence.

Overview

6
6
years of professional experience
58
58
years of post-secondary education
11
11
Certifications
4
4
Languages

Work History

Data Scientist II

HCA Healthcare Inc.
06.2023 - Current
  • Led a team of 5 Data Scientists to develop Data Science Computer Vision use cases in Radiology and Natural Language Processing use cases with Generative AI and Large Language Models (LLMs).
  • Built infrastructure in Google Cloud to combine HCA's repository of 30 billion radiology images with radiologist impressions to automate data labeling for supervised learning.
  • Led the development of Nurse Shift Handoff report generated using Large Language Models. Applied zero-shot and few-shot learning and model fine tuning to optimize model outputs. This report is utilized by nurses at the time of their shift change to transfer care of patient to the incoming nurse.
  • Led the development of Emergency Department Transfer Notes generated using Large Language Models. These notes are utilized by physicians to provide a plan of care as they discharge patients from the emergency department.
  • Led the development of radiologist work prioritization models for Head Stroke and Pulmonary Embolism Computed Tomography (CT Scans). Due to the national shortage of radiologists reading radiology studies have a long delays which can be detrimental for patients in life threatening cases. These prioritization tools help radiologists determine which radiology studies have a high level of severity so they can be read in a timely manner.

Data Scientist

HCA Healthcare Inc.
04.2022 - 06.2023
  • Led the data science Google Cloud computing strategy as part of the $650 million investment by HCA Healthcare in Google Cloud.
  • Worked with the cloud infrastructure team and utilized Terraform Infrastructure as Code to create and test cloud infrastructure for the data science teams across HCA.
  • Utilized Google Cloud Platform with Vertex AI, Google Cloud Storage, Artifact Registry, Cloud Run, Big Query, and Healthcare API to build and test applications for patient care.
  • Built autoregressive forecasting models and improved patient forecasting by 35% to optimize nurse staffing.
  • Utilized GitHub to version control code and built CI/CD pipelines for model deployment.

Data Scientist I

TriStar Health | HCA Healthcare
08.2021 - 04.2022
  • Worked on building statistical time series forecast models for patient volume prediction.
  • Built and optimized deep learning computer vision models on Chest Xrays for diagnosis of thoracic diseases in patients.
  • Used Python with scientific computing libraries including Sklearn, TensorFlow, and Statsmodels to build and test machine learning and statistical models.
  • Used Openshift, Docker, and Google Cloud for model deployment and integration.
  • Used JIRA and GitHub to organize workflow and maintain project code base.

Financial Analyst

Parallon Business Solutions
02.2018 - 08.2021
  • Built and maintained statistical time series models, in Python, to predict patient volume and staffing requirements on a monthly and yearly basis.
  • Built dashboards to visualize patient volume at all serviced hospitals using SQL and Power BI.
  • Built ad hoc reports, in SQL, Teradata, Python, and SAP Business Objects, to be provided to different departments in order to facilitate their workflow.
  • Used SQL and Python to implement a data extraction and loading procedure for the Prebill Denials department and reduced latency by 6 hours.
  • Implemented a forecasting solution using time series models, in Python, to predict the volume of calls received at the customer service center to aid in estimating staff requirements.
  • Built and analyzed monthly operating reviews to be used by the executive team, for all serviced hospitals.
  • Mentored a financial analyst in training by collaborating on ad hoc requests and projects.

Data Science Projects

01.2019 - Current

- Classifier for Breast Cancer Classification with scikit-learn. Models used Logistic Regression.

o Tools used: Pandas, Numpy, Matplotlib, and Scikit-learn.

- Did data pre-processing and built a seasonal autoregressive model to forecast the amount of SO2 in air.

o Tools used: Pandas, Numpy, Scikit-learn, Statsmodels, Pyramid Arima

- Built a Natural Language Processing application to rate hotel reviews by users in real time.

o Tools used: Python, NLTK, Pandas, Numpy, Scikit-learn, Heroku, Flask, and Matplotlib

- Built a movie recommender system by using natural language processing. Designed a project pipeline including data storage, data pre-processing, feature engineering and selection, and modeling.

o Tools used: Pandas, Numpy, SQLite, Scikit-learn, Natural Language Toolkit NLTK, Matplotlib.

- Built a tweet classifier to classify if a given tweet based on its text, location, and keywords signifies a disaster/emergency or not. Designed a project pipeline including data storage and retrieval, data pre-processing, feature engineering and selection, and modeling.

o Tools used: Pandas, Numpy, SQLite, Scikit-learn, Natural Language Toolkit NLTK, Matplotlib.

- Did data pre-processing, feature engineering, and model building and evaluation on the Lending Club loan data to classify defaulted loans. Models used include Logistic Regression, Naïve Bayes, Random Forest, and K-Nearest Neighbors

o Tools used: Pandas, Numpy, Scikit-learn, Matplotlib, and Seaborn.

Education

Ph.D. - Computer Science And Software Engineering

Auburn University
Auburn, AL
05.2001 - 05.2024

Master of Science - Data Science

Lipscomb University
Nashville, TN
05.2001 - 05.2020

BBA - Computer Information Systems And Finance

University of North Alabama
05.2001 - 05.2017

Skills

    Python, SQL, SAP, DB2, MySQL, MS SQL Server, Teradata, Pandas, Numpy, Scipy, Github, Git, R

Scikit learn, Statsmodels, NetworkX, Natural Language Toolkit, Pyramid Arima

HTML, CSS, Javascript, Flask, Matplotlib, Seaborn

Microsoft Office Tools - Excel, Access, Power Point, Microsoft Power BI

Cloud Computing, Terraform IaS, Google Cloud, Vertex AI, Google Cloud Storage, Artifact Registry, Big Query, HealthcareAPI

Websites, Portfolios, Profiles

LinkedIn: https://www.linkedin.com/in/ali-murad-b90a7a112/

GitHub: https://github.com/amuraddd

Medium: https://medium.com/@alimuradd7

Data Science Personal Project Portfolio

Data Science Project Portfolio: https://github.com/amuraddd

- Classifier for Breast Cancer Classification with scikit-learn. Models used Logistic Regression.

o Tools used: Pandas, Numpy, Matplotlib, and Scikit-learn.

- Did data pre-processing and built a seasonal autoregressive model to forecast the amount of SO2 in air.

o Tools used: Pandas, Numpy, Scikit-learn, Statsmodels, Pyramid Arima

- Built a Natural Language Processing application to rate hotel reviews by users in real time.

o Tools used: Python, NLTK, Pandas, Numpy, Scikit-learn, Heroku, Flask, and Matplotlib

- Built a movie recommender system by using natural language processing. Designed a project pipeline including data storage, data pre-processing, feature engineering and selection, and modeling.

o Tools used: Pandas, Numpy, SQLite, Scikit-learn, Natural Language Toolkit NLTK, Matplotlib.

- Built a tweet classifier to classify if a given tweet based on its text, location, and keywords signifies a disaster/emergency or not. Designed a project pipeline including data storage and retrieval, data pre-processing, feature engineering and selection, and modeling.

o Tools used: Pandas, Numpy, SQLite, Scikit-learn, Natural Language Toolkit NLTK, Matplotlib.

- Did data pre-processing, feature engineering, and model building and evaluation on the Lending Club loan data to classify defaulted loans. Models used include Logistic Regression, Naïve Bayes, Random Forest, and K-Nearest Neighbors

o Tools used: Pandas, Numpy, Scikit-learn, Matplotlib, and Seaborn.

Modeling and Analytical Skills

  • Machine Learning - Regression, Classification, Clustering, Dimensionality Reduction, Regularization
  • Deep Learning - Neural Networks, Generative AI and Large Language Models (LLMs)
  • Computer Vision and Natural Language Processing
  • Data Mining and Statistical Analysis
  • Time Series Forecasting and Analysis

Publications

  • X. Zhang, T. F. Stafford, A. Murad, A. Risher, and J. Simmons, “Journal of Information Technology Management HOW TO MEASURE IT EFFECTIVENESS: THE CIO’S PERSPECTIVE,” Journal of Information Technology Management, vol. XXIX, no. 4, 2018.
  • Murad, A. (2024, January 22). Logistic regression: Classifier for breast cancer classification with scikit-learn. Medium. https://medium.com/@alimuradd7/logistic-regression-classifier-for-breast-cancer-classification-with-scikit-learn-efd8a1acff69
  • Murad, A. (2024a, January 10). Time Series Analysis and forecasting using autoregressive models (ARIMA, SA). Medium. https://medium.com/@alimuradd7/time-series-analysis-and-forecasting-using-autoregressive-models-arima-sa-1c7b389de2d3
  • Murad, A. (2023, November 26). Starting machine learning with Jupyter Lab and python virtual environments on a Mac. Medium. https://medium.com/@alimuradd7/starting-machine-learning-with-jupyter-lab-and-python-virtual-environments-on-a-mac-1e9f04a5f8a2

Certification

Reinforcement Learning - University of Alberta | Coursera

Interests

Tennis

Running

Reading fiction and non-fiction books

Listening music, watching movies and TV shows, and playing guitar

Timeline

Foundations of Project Management - Google | Coursera

11-2023

Data Scientist II

HCA Healthcare Inc.
06.2023 - Current

Reinforcement Learning - University of Alberta | Coursera

05-2022

Data Scientist

HCA Healthcare Inc.
04.2022 - 06.2023

Matrix Algebra for Engineers - The Hong Kong University of Science and Technology | Coursera

12-2021

Data Scientist I

TriStar Health | HCA Healthcare
08.2021 - 04.2022

Sentiment Analysis in Python - DataCamp

06-2021

Introduction to Natural Language Processing in Python - DataCamp

02-2021

Natural Language Processing with Classification and Vector Spaces - DeepLearning.AI | Coursera

02-2021

Mathematics for Machine Learning - Imperial College London | Coursera

12-2020

Divide and Conquer, Sorting and Searching, and Randomized Algorithms - Stanford, Online | Coursera

10-2020

Time Series with Python Track - DataCamp

10-2020

Intermediate Python for Data Science - DataCamp

12-2019

Data Science Projects

01.2019 - Current

Financial Analyst

Parallon Business Solutions
02.2018 - 08.2021

Machine Learning Foundations: A Case Study Approach - University of Washington | Coursera

06-2017

Ph.D. - Computer Science And Software Engineering

Auburn University
05.2001 - 05.2024

Master of Science - Data Science

Lipscomb University
05.2001 - 05.2020

BBA - Computer Information Systems And Finance

University of North Alabama
05.2001 - 05.2017
Ali MuradData Scientist II