Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
KEY STRENGTHS
Generic

Rizwan Mohammed

Fremont,USA

Summary

Experienced Senior Data Engineer with over 8 years of expertise in data engineering, metadata management, and data governance. Specialized in Alation, Databricks, Spark, Python, SQL, Azure, and Starburst, with proven ability to design and implement data catalog solutions, governance frameworks, and scalable ETL pipelines. Strong expertise in metadata onboarding, data lineage, data quality, and catalog configuration. Adept at working in cloud environments (Azure, AWS, Snowflake) with experience integrating metadata from SQL Server, Oracle, Azure Cloud, and Snowflake. Hands-on experience with Starburst for data federation and optimizing distributed queries across cloud data platforms. Demonstrated ability to collaborate with cross-functional teams, optimize big data solutions, and drive data governance best practices. Skilled in supporting data visualization and preparation tools such as Power BI, Tableau, and Alteryx, ensuring clean, reliable, and actionable data pipelines for analytical consumption. Extensive hands-on experience in Python scripting for automating data workflows, building transformation logic, and managing orchestration tasks using Airflow and DBT. Proficient in PySpark for processing large-scale distributed datasets in Databricks and AWS Glue environments, optimizing performance and resource usage. Practical experience with Airtable for managing datasets, automating workflows, and integrating external data sources, ensuring smooth collaboration and streamlined reporting. Working knowledge of Palantir Foundry, including experience collaborating on data modeling and workflow design using Workshop and Quiver, with exposure to Slate for user interface components. Created pipelines in ADF using Linked Services/Datasets/Pipeline to Extract, Transform, and Load data from multiple sources like Azure SQL, Blob Storage, and Azure SQL Data Warehouse. Developed JSON scripts for deploying ADF pipelines that process large-scale data using SQL Activities.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

American International Group (AIG)
01.2022 - Current
  • Configured and maintained Alation Data Catalog, ensuring seamless metadata governance and compliance.
  • Developed and optimized data ingestion pipelines using Azure Data Factory (ADF), Databricks, and Snowflake.
  • Led metadata onboarding efforts, integrating metadata from SQL Server, Oracle, Azure Cloud, and Snowflake.
  • Built CI/CD pipelines for Databricks notebooks, jobs, and workflows using Git-based version control and automated deployments across dev, test, and prod environments.
  • Built and maintained complex PySpark jobs in Databricks to support real-time and batch data transformation for enterprise-level analytics.
  • Developed Python-based automation scripts for metadata validation, ETL job triggering, and integration with data catalog APIs.
  • Designed and deployed scalable data pipelines on Azure using Databricks, Azure Data Factory, and Azure Storage services.
  • Designed and implemented large-scale data warehouses on AWS Redshift, optimizing schema design, distribution keys, and sort keys for performance and scalability.
  • Implemented data lake architecture on Azure Data Lake Storage (ADLS) with fine-grained access controls.
  • Designed and maintained Airtable bases for project tracking and data integration, ensuring seamless collaboration across teams.
  • Automated workflows in Airtable using scripts and API integrations to streamline reporting and reduce manual effort.
  • Worked with Starburst Enterprise to enable data federation and optimize distributed queries across data lakes and cloud data warehouses like Snowflake and AWS S3.
  • Assisted in Starburst configuration and query performance tuning to improve data accessibility for analytics teams.
  • Integrated data pipelines in Databricks and Snowflake with downstream visualization tools like Tableau and data prep solutions like Alteryx.
  • Created parameterized notebooks and modular code in Python to streamline data processing and improve reusability.
  • Built and optimized end-to-end ETL/ELT workflows using Python and SQL for large, multi-source datasets.
  • Managed Databricks Repos for source control and enforced branching strategies to support team-based development.
  • Integrated Databricks deployments with CI tools (GitHub Actions / Jenkins) for code validation and controlled releases.
  • Designed and optimized AWS Glue ETL jobs to transform large-scale data sets; integrated with DynamoDB, Lambda, and Redshift for analytics workloads.
  • Built data ingestion pipelines incorporating streaming (Kafka) and batch patterns with robust error handling and replay mechanisms.
  • Collaborated with stakeholders working on Palantir Foundry to design data workflows using Workshop, participate in metadata governance discussions, and contribute to integration efforts between Foundry and existing pipelines.
  • Designed and validated data lineage diagrams, improving data traceability across multiple platforms.
  • Automated ETL workflows using Apache Airflow and DBT, reducing manual intervention by 60%.
  • Automated Redshift cluster maintenance tasks like vacuuming, snapshot backups, and cost optimization through scheduling and monitoring scripts.
  • Developed custom dashboards and API integrations to enhance data catalog usability and governance.
  • Developed interactive dashboards and reports in Power BI to support business decision-making and KPI tracking.
  • Spearheaded data quality checks and governance policies to ensure accuracy and consistency in metadata.
  • Collaborated with cross-functional teams to implement scalable data governance solutions.
  • Provided technical leadership in designing data models and stewardship workflows.
  • Optimized Alation configurations for business and technical lineage enhancements.

Senior Data Engineer

GE Oil & Gas
06.2020 - 12.2021
  • Led Alation implementation and configuration, enabling enterprise-wide metadata management.
  • Designed and developed metadata pipelines for seamless data cataloging and governance.
  • Created custom Alation workflows for metadata validation, improving catalog accuracy by 40%.
  • Collaborated with DevOps and platform teams to align Databricks CI/CD workflows with enterprise deployment standards.
  • Developed data security policies and access control mechanisms within Alation and Azure Data Lake.
  • Automated recurring data workflows, reducing processing time and improving reliability.
  • Implemented data quality validation and auditing processes across Redshift tables to ensure consistent and reliable data for analytics.
  • Migrated on-premise and vendor data sources to Azure cloud infrastructure, improving reliability and performance.
  • Built and optimized PySpark and SQL workflows in Azure Databricks for batch and streaming data.
  • Tracked Databricks environment changes using Git and release versioning to support auditability and rollback when needed.
  • Designed, administered, and supported BI platforms including Tableau, Cognos, and Alteryx, ensuring high availability and performance.
  • Implemented infrastructure provisioning and automation with Terraform, ensuring consistency and repeatability across environments.
  • Developed and tuned Elasticsearch indexes to support low-latency search and analytics.
  • Collaborated with BI teams to prepare and model data optimized for Tableau dashboards, ensuring data pipelines aligned with visualization requirements.
  • Supported Alteryx workflows by enabling seamless data integration and ensuring clean, structured datasets for advanced data prep and analytics tasks.
  • Built real-time Kafka pipelines to integrate metadata with Snowflake for better data traceability.
  • Managed technical lineage and metadata onboarding across multiple cloud and on-prem data sources.
  • Troubleshot BI platform, system, and data issues, working directly with vendors when needed.
  • Developed and deployed Python-based automation scripts to streamline metadata ingestion.
  • Integrated Power BI with SQL databases and cloud sources to deliver real-time insights to stakeholders.
  • Leveraged PySpark and Apache Spark for high-volume data processing; resolved data skew and optimized partitioning strategies for performance.
  • Collaborated with cross-functional teams to design and maintain relational and NoSQL data models, ensuring query efficiency and scalability.
  • Provided training and documentation on best practices for using Alation and metadata tools.
  • Integrated Databricks ETL pipelines for processing large-scale structured and unstructured data.
  • Assisted in designing enterprise data governance policies for compliance and audit readiness.

Big Data Engineer

Philips Inc.
01.2019 - 05.2020
  • Designed and implemented Hadoop-based ETL pipelines for processing multi-terabyte datasets.
  • Developed PySpark-based transformations for large-scale structured and semi-structured data.
  • Configured Hive, Spark, and MapReduce for high-performance data processing and analytics.
  • Built automation scripts using Python and SQL to improve data ingestion efficiency.
  • Designed and implemented PySpark-based ETL pipelines in Databricks for transforming large-scale datasets.
  • Created CI/CD workflows using GitHub and Jenkins for automated deployment of ETL and data pipeline code.
  • Integrated Azure Key Vault for secrets management and compliance with security standards.
  • Leveraged Azure Monitor and Log Analytics to track pipeline performance, costs, and system health.
  • Implemented data governance and metadata management practices using tools like Alation / Collibra.
  • Developed automation scripts for monitoring, maintenance, and cleanup of BI systems.
  • Managed security and access control in Redshift, including IAM roles, encryption (KMS), and VPC configuration for secure data access.
  • Collaborated with architects and stakeholders to design cloud-first data strategies aligned with business goals.
  • Applied data governance and security best practices to ensure compliance with organizational standards.
  • Provided troubleshooting support for Starburst, Tableau, and Alteryx users, ensuring data reliability and performance across tools.
  • Utilized Python extensively for building reusable data transformation modules, data validations, and pipeline orchestration.
  • Collaborated on a project utilizing Palantir Foundry for data workflow design and analytics dashboarding, contributing to module development and integration efforts.
  • Led performance tuning initiatives, improving query execution time by 35%.
  • Worked with AWS and Azure cloud environments for scalable data processing.
  • Designed data lakes and governance strategies to ensure efficient data discovery.
  • Integrated Kafka streaming data with data warehouses for real-time analytics.
  • Managed data validation frameworks to ensure data quality and integrity.
  • Developed batch processing workflows using Apache Airflow and DBT.

Data Engineer

AgFirst
10.2017 - 12.2018
  • Designed and built ETL pipelines for structured and unstructured data ingestion.
  • Created and optimized SQL queries for data warehousing and reporting in AWS Redshift.
  • Designed data governance frameworks leveraging Collibra and Alation.
  • Developed data validation scripts to ensure data quality before onboarding into catalogs.
  • Worked closely with business analysts and stakeholders to gather requirements and deliver scalable BI solutions.
  • Managed cloud infrastructure and BI workloads on AWS (EC2, ALB, security groups, IAM).
  • Built Python-based automation solutions for metadata tagging and transformation.
  • Integrated data lineage tools with Alation to enhance data cataloging.
  • Provided technical support and troubleshooting for metadata ingestion issues.
  • Managed API integrations between Alation and enterprise data sources.
  • Assisted in migration projects, moving legacy data pipelines to AWS.
  • Conducted knowledge-sharing sessions to train teams on data governance best practices.

Education

Master of Science - Computer Information Systems

Trine University
Indiana
01.2016

Skills

  • Languages: Python, SQL, Java, Scala
  • Big Data & ETL: Spark, Hadoop, Hive, Kafka, Alation, SSIS, DBT, DataStage, Informatica
  • Databases: PostgreSQL, SQL Server, Oracle, Snowflake, Redshift
  • Cloud Platforms: Azure (Data Factory, Data Lake, Synapse), AWS (Glue, Redshift, S3), GCP
  • Tools & Technologies: Alation, Collibra, Airflow, Palantir Foundry (Workshop, Quiver),Terraform, Kubernetes, Docker, Power BI
  • Git version control
  • ETL development
  • Big data processing
  • Python programming

Certification

  • Databricks Certified Data Engineer Professional – 2025
  • Databricks Certified Machine Learning Professional – 2025
  • Databricks Certified Data Engineer Associate – 2025
  • Databricks Fundamentals Accreditation – 2025
  • Oracle Cloud Infrastructure 2021 Certified Architect Associate.
  • Oracle Cloud Infrastructure Foundations 2021 Certified Associate.

Timeline

Senior Data Engineer

American International Group (AIG)
01.2022 - Current

Senior Data Engineer

GE Oil & Gas
06.2020 - 12.2021

Big Data Engineer

Philips Inc.
01.2019 - 05.2020

Data Engineer

AgFirst
10.2017 - 12.2018

Master of Science - Computer Information Systems

Trine University

KEY STRENGTHS

  • Expertise in Alation, data governance, and metadata management.
  • Strong proficiency in SQL, Python, Spark, and cloud data engineering.
  • Deep understanding of data lineage, quality frameworks, and catalog optimization.
  • Proven track record of configuring and maintaining large-scale data catalogs.
  • Experienced in cloud-based data engineering solutions and automation.
  • Ability to work independently and collaboratively to drive data governance initiatives.