Aspiring Data Engineer with hands-on training in cloud and big data technologies, including AWS and Databricks. Skilled in designing and optimizing scalable data pipelines using Spark and Delta Lake. Strong foundation in SQL, Python, and data modeling. Currently pursuing AWS Cloud Practitioner and Databricks Data Engineer certifications. Committed to building production-grade data engineering workflows and delivering business-ready insights.
Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.
Coursework: Business Intelligence, Data Visualization, Financial Fluency, Statistical Methods, Machine Learning
Coursework: Advanced Financial Accounting, Corporate Finance, Auditing and Assurance Services
Retail ETL Pipeline with PySpark
• Built an end-to-end ETL pipeline using PySpark on Databricks
• Extracted transactional retail data from CSV, transformed it with joins, filters, and aggregates, and wrote clean data to
Delta Lake
• Partitioned and cached data for optimized reads; visualized KPIs like total sales, region-wise revenue, and customer
segments
NYC Taxi Data Pipeline and Analysis
• Processed 10M+ rows of NYC Yellow Taxi data using Spark
• Cleaned and enriched data with calculated columns (trip duration, fare per mile), then persisted as Delta format
• Used Z-Ordering, partitioning, and caching to tune performance