Employed Palantir Foundry for end-to-end data integration and analytics, leveraging Foundry’s Quiver for exploratory data analysis, Contour for data visualization, Ontology Manager for mapping data relationships, and Pipelines for managing automated data workflows.
Designed and implemented a workflow in Palantir Foundry, utilizing AIP. Extracted and transformed document contents into structured text chunks, batch-processed them with LLMs for entity recognition, and integrated the results into Foundry's Ontology using Ontology Manager for efficient data mapping. Configured a Knowledge Graph using Vertex to visualize relationships among entities and deployed an interactive app via Workshop, utilizing AIP Agent to provide reliable, context-based responses.
Developed and maintained interactive Tableau dashboards with advanced visualizations, such as custom charts, graphs, filters, and parameters, tailored to meet diverse user needs while simultaneously designing and implementing ETL processes to consolidate data from multiple sources. This ensured data consistency and accessibility, enhancing the overall analysis experience within Tableau.
Built and managed interactive Power BI dashboards with sophisticated visuals, such as custom charts, slicers, and drill-through capabilities to address various user needs. Developed and fine-tuned DAX calculations and measures for in-depth data insights. Designed and structured data models using Power Query, integrating data from multiple sources and ensuring seamless data refresh operations.
Utilized Databricks Notebooks with PySpark and Pandas to ingest, clean, and process large datasets efficiently. Performed essential data engineering tasks such as data partitioning, joins, and aggregations using Apache Spark to enhance performance and scalability. Leveraged Pandas for initial data exploration and transformations on smaller subsets, seamlessly transitioning to PySpark for distributed processing of large-scale data.
Leveraged Delta Lake on Databricks for efficient data storage, ensuring data reliability with features like ACID transactions and schema enforcement. Optimized data processing and transformations using Spark SQL and various data processing APIs, including PySpark and the Pandas API on PySpark, to handle large-scale data efficiently in a distributed environment.
Implemented scalable data analytics solutions using Microsoft Azure Synapse Analytics. Developed and orchestrated ETL workflows to ingest and transform data from various sources, ensuring high data quality and accessibility. Leveraged Apache Spark within Synapse for large-scale data processing and utilized serverless SQL pools to perform efficient, on-demand data analysis, Integrating data pipelines with Azure Data Lake Storage
Developed data ingestion and transformation on Microsoft Azure using Azure Data Factory, automating data pipelines and integrating structured and unstructured data for real-time analytics.
Developed a comprehensive machine learning pipeline using Databricks ML for predictive maintenance on industrial sensor data. Utilized Feature Store for creating and managing reusable features, ensuring data consistency and efficient access across models. Employed MLflow for extensive experiment tracking, model versioning, and reproducibility. Implemented and tuned multiple algorithms, including Random Forest and XGBoost, using distributed training for improved prediction accuracy. Integrated Pandas API on PySpark for scalable data cleaning and transformation, as well as libraries like PySpark, Scikit-Learn, and TensorFlow for efficient data processing and model training in a distributed environment
Applied Azure Machine Learning to develop predictive models, leveraging the Scikit-Learn library for custom preprocessing and data transformation tasks within Azure ML pipelines, ensuring efficient, reproducible, and automated workflows. Utilized Azure Machine Learning Studio for model experimentation with algorithms such as Decision Forest Regression, Boosted Decision Tree Regression, and Neural Network Regression, optimizing model performance using built-in hyperparameter tuning capabilities. Deployed models as managed web services on Azure Kubernetes Service (AKS) for scalable and secure inference. Implemented comprehensive model monitoring using Azure’s integrated tools to track performance metrics, detect data drift, and maintain model reliability, with automated alerts and logging for proactive maintenance.
Leveraged Azure AutoML functionality to automate the selection and tuning of algorithms, streamlining the model development process and achieving optimal performance. AutoML automatically ranked and evaluated models, while optimizing hyperparameters such as learning rate, maximum depth, and number of estimators to improve overall model accuracy and efficiency.
Built a deep learning model for car logo recognition using a CNN-based approach, leveraging TensorFlow and OpenCV for preprocessing and model optimization, achieving high-accuracy image recognition capabilities.
Created and optimized SQL Server tables, views, stored procedures, functions, and triggers to support core application functionalities, enhancing database performance and maintainability.
Performed basic database administration tasks including job monitoring, backup, recovery, and performance monitoring, troubleshooting slow-running queries and resolving deadlocks for optimal database efficiency.
Built reports using SQL Server Reporting Services (SSRS)
Engineered ETL processes with SSIS for transforming and loading data into a centralized data warehouse, automating transformations with packages like derived columns, lookups, and conditional splits, script tasks and execute SQL tasks etc.
Education
Bachelor of Science - Computer Science
Emory University
Atlanta, GA
07-2027
Skills
Data Engineering & Analytics: Databricks (Data Engineering, AI, and Machine Learning), Microsoft Azure, SQL Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), Palantir Foundry
Database Management: RDBMS Development, SQL, Data Integration and ETL, Data Ingestion
Data Visualization: Tableau Desktop Specialist, Power BI, Interactive Dashboard Creation
Big Data Processing: PySpark, Apache Spark, Data Transformation, Delta Lake
Machine Learning & AI: Model Development, Deep Learning (Image Recognition), Data Science Frameworks
Python, R, SQL
Certification
Microsoft Certified - AzureData Engineer Associate
Databricks Certified - Machine Learning Associate
Microsoft Certified - AzureData Scientist Associate
Databricks Certified Associate Developer for Apache Spark 3.0
Academy Accreditation - Generative AI Fundamentals (Databricks)
Machine Learning Practitioner (Databricks)
Tableau Desktop Specialist
Academy Accreditation - Platform Administrator (Databricks)