
Detail-oriented Data Analyst with 3+ years of experience analyzing large datasets, optimizing SQL queries, and building data visualization dashboards. Experienced in transforming business requirements into data-driven insights, and collaborating with cross-functional teams. Skilled in Python, SQL, PySpark, AWS Glue, and Snowflake for building scalable data pipelines and analytics solutions. Strong focus on data quality, performance optimization, and delivering actionable insights to support strategic decision-making.
Designed and implemented end-to-end data pipeline utilizing AWS Glue and PySpark for customer and transaction datasets in Amazon S3.
Developed PySpark transformation logic to standardize formats, eliminate duplicates, and apply business rules for revenue and customer segmentation metrics.
Built automated ETL pipelines to load transformed datasets into Snowflake, enabling scalable cloud data warehousing.
Wrote optimized SQL queries in Snowflake to enhance BI dashboards and business performance reporting.
Implemented data validation checks to ensure integrity across ingestion, transformation, and warehouse layers.
Reduced manual reporting workload by 40% through automation of cloud-based data pipelines.
• Designed SQL data models and transformation logic to support marketing and operational reporting across multiple business units.
• Developed Python-based data processing scripts to automate the ingestion and cleaning of large datasets from multiple data sources.
• Implemented data quality validation checks on over 1 million records, identifying anomalies, missing values, and duplicate transactions to improve reporting accuracy.
• Built automated ETL workflows to transform raw data into analytics-ready datasets, reducing manual reporting effort by 35%.
• Collaborated with marketing and operations teams to define 12+ business KPIs, enabling better campaign performance tracking and decision-making.