
Skilled Data Engineer with a strong focus on Python programming, SQL optimization, and data quality assurance. Proven track record in orchestrating reliable data pipelines and developing robust data solutions.
· Designed Redshift schemas, staging layers, and SQL-based transformation patterns to support curated reporting datasets, downstream business intelligence workflows, historical data processing, and structured analytical consumption across enterprise reporting systems.
· Developed PyArrow-based data conversion utilities to transform raw text, CSV, delimited source files into compressed Parquet datasets, standardizing storage formats and preparing structured data for downstream querying and analytical processing workflows.
· Developed Python scripts using Pandas and NumPy to clean, parse, normalize, and transform source datasets for downstream processing.
· Orchestrated batch data pipelines with Apache Airflow, configuring DAGs, schedules, and retry logic to ensure reliable integration workflows.
· Optimized SQL queries, joins, and stored procedures to support historical DB consolidation and downstream reporting across data systems.
· Implemented quality checks with Great Expectations to validate schema compliance and data integrity during processing.
· Managed source control using Git and GitHub, supporting versioning, branching, code reviews, and collaborative development activities.