Senior Data Quality Engineer
Worked on large-scale enterprise data platforms supporting Marketing, Identity Resolution, and Data Privacy initiatives across household, demographic, and transactional datasets.
Supported production data operations across 30+ upstream data pipelines processing high-volume data at enterprise scale.
Monitored, triaged, and resolved production pipeline failures by performing root cause analysis and coordinating with Development and DevOps teams.
Led validation efforts for migration of 100+ StreamSets pipelines across multiple enterprise applications, ensuring successful migration with minimal operational impact.
Supported StreamSets server migration activities including application onboarding, server configuration updates, wallet path management, database credential updates, and environment validation.
Worked closely with DevOps teams to troubleshoot infrastructure, access control, and sudo permission-related issues during migration and production support activities.
Built reusable SQL-based validation frameworks for post-load validation, data reconciliation, operational monitoring, and production data quality verification.
Performed source-to-target validation across Hive, Impala, Oracle, and distributed data platforms to ensure data accuracy and completeness.
Supported CCPA compliance initiatives by validating retention-based deactivation and removal of production data using SQL-driven verification processes.
Investigated and resolved PySpark job failures in production environments by analyzing execution logs, identifying failure patterns, and supporting pipeline recovery.
Supported production deployments, release validation, and operational readiness activities across multiple applications and data domains.
Collaborated with Data Engineering, Product, Business, and Infrastructure teams to ensure reliable delivery of critical enterprise data assets.
Contributed to operational documentation, migration runbooks, deployment procedures, and support knowledge repositories to improve platform reliability and team efficiency.
Technologies:
SQL | Python | Pandas | Hive | Impala | PySpark | Hadoop | Oracle | StreamSets | Git | Jira | Linux/Unix | Data Quality | ETL Validation | Production Support | Agile |Confluence Page| Jira
