Applied machine learning techniques such as regression, classification, clustering, and deep learning to develop robust predictive models.
• Developed end-to-end machine learning pipelines in PyTorch and TensorFlow/Keras, from data preprocessing and augmentation to model training and evaluation, ensuring reproducibility and scalability.
• Leveraged Databricks to develop and orchestrate end-to-end ETL pipelines, enabling large-scale data processing and real-time analytics across multiple data sources.
• Implemented MLflow on Databricks to track experiments, manage model versions, and streamline collaboration across data science teams.
• Optimized Spark jobs in Databricks for faster processing of structured and unstructured datasets, improving model training times and reducing compute costs.
• Optimized data processing pipelines using Apache Spark, enabling the processing of large-scale datasets in parallel and drastically reducing computation time for complex analytics tasks.
• Deployed machine learning models on Azure Machine Learning and automated deployment within Docker containers, streamlining the process from development to production.
• Automated model deployment and scaling in Kubernetes, ensuring models handled fluctuating traffic by dynamically allocating resources as needed.
• Integrated Azure data services such as Azure Data Lake and Azure SQL Database to manage large datasets and create scalable solutions for business intelligence and predictive analytics.
• Performed data cleaning, transformation, and preprocessing using Hadoop ecosystem tools and Python, ensuring high-quality datasets for analysis and ML tasks.
• Developed and executed ad-hoc SQL queries, collaborated with analysts, and created reusable Python codebases to improve accessibility, efficiency, and consistency.
• Automated Tableau reporting processes and implemented data security best practices using User Filters and Row-Level Security (RLS) for sensitive information protection.
• Implemented advanced statistical analyses using SciPy, such as hypothesis testing, regression analysis, and probability distributions, to support data-driven business decisions.
• Handled time series data manipulation in Pandas, including time-based aggregations and resampling, to improve forecasting accuracy and business planning.
• Managed automated testing frameworks within Jenkins to validate data models, pipelines, and analytics code before production deployment.
• Used Git/GitHub to track and manage the evolution of analytical models, ensuring version control and smooth collaboration across teams.
• Provided training to team members on R programming, data manipulation, visualization best practices, and model interpretation to improve overall data literacy.
• Explained model results to non-technical stakeholders by visualizing feature importance and interpreting complex ML models for better business understanding.