Overview
Work History
Education
Skills
Certification
Timeline
Generic

M Sufder

Stone Mountain,GA

Overview

1
1
year of professional experience
1
1
Certification

Work History

Data Scientist

YSMF Inc.
YSMF Inc., GA
06.2023 - Current
  • Employed Palantir Foundry for end-to-end data integration and analytics, leveraging Foundry’s Quiver for exploratory data analysis, Contour for data visualization, Ontology Manager for mapping data relationships, and Pipelines for managing automated data workflows.
  • Designed and implemented a workflow in Palantir Foundry, utilizing AIP. Extracted and transformed document contents into structured text chunks, batch-processed them with LLMs for entity recognition, and integrated the results into Foundry's Ontology using Ontology Manager for efficient data mapping. Configured a Knowledge Graph using Vertex to visualize relationships among entities and deployed an interactive app via Workshop, utilizing AIP Agent to provide reliable, context-based responses.
  • Developed and maintained interactive Tableau dashboards with advanced visualizations, such as custom charts, graphs, filters, and parameters, tailored to meet diverse user needs while simultaneously designing and implementing ETL processes to consolidate data from multiple sources. This ensured data consistency and accessibility, enhancing the overall analysis experience within Tableau.
  • Built and managed interactive Power BI dashboards with sophisticated visuals, such as custom charts, slicers, and drill-through capabilities to address various user needs. Developed and fine-tuned DAX calculations and measures for in-depth data insights. Designed and structured data models using Power Query, integrating data from multiple sources and ensuring seamless data refresh operations.
  • Utilized Databricks Notebooks with PySpark and Pandas to ingest, clean, and process large datasets efficiently. Performed essential data engineering tasks such as data partitioning, joins, and aggregations using Apache Spark to enhance performance and scalability. Leveraged Pandas for initial data exploration and transformations on smaller subsets, seamlessly transitioning to PySpark for distributed processing of large-scale data.
  • Leveraged Delta Lake on Databricks for efficient data storage, ensuring data reliability with features like ACID transactions and schema enforcement. Optimized data processing and transformations using Spark SQL and various data processing APIs, including PySpark and the Pandas API on PySpark, to handle large-scale data efficiently in a distributed environment.
  • Implemented scalable data analytics solutions using Microsoft Azure Synapse Analytics. Developed and orchestrated ETL workflows to ingest and transform data from various sources, ensuring high data quality and accessibility. Leveraged Apache Spark within Synapse for large-scale data processing and utilized serverless SQL pools to perform efficient, on-demand data analysis, Integrating data pipelines with Azure Data Lake Storage
  • Developed data ingestion and transformation on Microsoft Azure using Azure Data Factory, automating data pipelines and integrating structured and unstructured data for real-time analytics.
  • Developed a comprehensive machine learning pipeline using Databricks ML for predictive maintenance on industrial sensor data. Utilized Feature Store for creating and managing reusable features, ensuring data consistency and efficient access across models. Employed MLflow for extensive experiment tracking, model versioning, and reproducibility. Implemented and tuned multiple algorithms, including Random Forest and XGBoost, using distributed training for improved prediction accuracy. Integrated Pandas API on PySpark for scalable data cleaning and transformation, as well as libraries like PySpark, Scikit-Learn, and TensorFlow for efficient data processing and model training in a distributed environment
  • Applied Azure Machine Learning to develop predictive models, leveraging the Scikit-Learn library for custom preprocessing and data transformation tasks within Azure ML pipelines, ensuring efficient, reproducible, and automated workflows. Utilized Azure Machine Learning Studio for model experimentation with algorithms such as Decision Forest Regression, Boosted Decision Tree Regression, and Neural Network Regression, optimizing model performance using built-in hyperparameter tuning capabilities. Deployed models as managed web services on Azure Kubernetes Service (AKS) for scalable and secure inference. Implemented comprehensive model monitoring using Azure’s integrated tools to track performance metrics, detect data drift, and maintain model reliability, with automated alerts and logging for proactive maintenance.
  • Leveraged Azure AutoML functionality to automate the selection and tuning of algorithms, streamlining the model development process and achieving optimal performance. AutoML automatically ranked and evaluated models, while optimizing hyperparameters such as learning rate, maximum depth, and number of estimators to improve overall model accuracy and efficiency.
  • Built a deep learning model for car logo recognition using a CNN-based approach, leveraging TensorFlow and OpenCV for preprocessing and model optimization, achieving high-accuracy image recognition capabilities.
  • Created and optimized SQL Server tables, views, stored procedures, functions, and triggers to support core application functionalities, enhancing database performance and maintainability.
  • Performed basic database administration tasks including job monitoring, backup, recovery, and performance monitoring, troubleshooting slow-running queries and resolving deadlocks for optimal database efficiency.
  • Built reports using SQL Server Reporting Services (SSRS)
  • Engineered ETL processes with SSIS for transforming and loading data into a centralized data warehouse, automating transformations with packages like derived columns, lookups, and conditional splits, script tasks and execute SQL tasks etc.

Education

Bachelor of Science - Computer Science

Emory University
Atlanta, GA
07-2027

Skills

  • Data Engineering & Analytics: Databricks (Data Engineering, AI, and Machine Learning), Microsoft Azure, SQL Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), Palantir Foundry
  • Database Management: RDBMS Development, SQL, Data Integration and ETL, Data Ingestion
  • Data Visualization: Tableau Desktop Specialist, Power BI, Interactive Dashboard Creation
  • Big Data Processing: PySpark, Apache Spark, Data Transformation, Delta Lake
  • Machine Learning & AI: Model Development, Deep Learning (Image Recognition), Data Science Frameworks
  • Python, R, SQL

Certification

  • Microsoft Certified - Azure Data Engineer Associate
  • Databricks Certified - Machine Learning Associate
  • Microsoft Certified - Azure Data Scientist Associate
  • Databricks Certified Associate Developer for Apache Spark 3.0
  • Academy Accreditation - Generative AI Fundamentals (Databricks)
  • Machine Learning Practitioner (Databricks)
  • Tableau Desktop Specialist
  • Academy Accreditation - Platform Administrator (Databricks)
  • Databricks Certified - Data Engineer Associate

Timeline

Data Scientist

YSMF Inc.
06.2023 - Current

Bachelor of Science - Computer Science

Emory University
M Sufder