19+ years of experience in Data Engineering, Data Analytics, and Cloud Technologies. Proven expertise in designing and optimizing data pipelines, ETL workflows, and analytical solutions to drive business insights. Skilled in Python, PySpark, AWS, and Hadoop, with a strong background in data warehousing, cloud migration, and DevOps. Adept at managing large-scale data processing, stakeholder engagement, and vendor management while ensuring high-performance, scalable, and cost-effective solutions. Passionate about leveraging data to enable decision-making and innovation.
• Designed and deployed end-to-end ETL workflows leveraging AWS Glue, S3, and Athena to support real-time and batch data processing, improving data availability for analytics and reporting.
• Developed reusable ETL scripts and automation logic to reduce manual intervention, streamline data ingestion processes, and ensure consistent data quality.
• Led optimization initiatives for AWS Glue jobs and Spark performance tuning, resulting in a 40% reduction in ETL processing time and improved job reliability.
• Collaborated with data analysts and data scientists to deliver curated datasets, enabling advanced analytics and machine learning model development.
• Implemented data quality checks and monitoring frameworks to proactively identify anomalies and bottlenecks in data pipelines.
• Delivered performance dashboards and operational KPIs using tools like QuickSight and Redshift, enabling data-driven decision-making for senior leadership.
• Conducted root cause analysis for data pipeline failures, driving the resolution of recurring issues and implementing long-term improvements in data architecture.