Experienced Lead Data Scientist with 10+ years of expertise in developing data-driven strategies and innovative machine learning solutions. Skilled in building data pipelines, predictive modeling, and implementing data governance standards. Proficient in SQL, R, Python, AI/ML, Power BI, and diverse cloud platforms. Strong leader of cross-functional teams, utilizing strategic data analysis to optimize business operations.
• Developed a predictive analytics pipeline using Python and Scikit-learn to identify high-risk properties for health and safety violations, based on historical inspection, complaint, and maintenance data.
• Integrated multi-source data using SQL and PySpark in Databricks, ensuring clean, scalable datasets for machine learning model training.
• Designed and published Power BI dashboards for operational managers and field inspectors to visualize risk levels across buildings, zones, and violation types.
• Built a forecasting model using R and AWS SageMaker to predict peak inspection periods and optimize inspector scheduling, improving field coverage and reducing overtime costs by 20%.
• Automated data extraction and cleansing using SQL scripts, enhancing the timeliness of inspection reports.
• Conducted clustering analysis to group buildings based on historical violations, population vulnerability, and inspection history to develop proactive inspection routes.
• Designed and implemented a financial forecasting system using predictive modeling (Random Forest, Linear Regression) to simulate various budget scenarios.
• Used Azure Machine Learning to deploy models and monitor performance in real-time.