
Experienced Senior Data Engineer with over 7 years of expertise in architecting, developing, and optimizing high-performance ETL pipelines for real-time and batch processing using Apache Spark (Scala and PySpark), SQL, and Python.
Proven success in migrating complex enterprise-scale data workflows from Scala Spark to PySpark, improving modularity, reducing latency, and enhancing maintainability.
Adept in handling diverse data sources and formats, including structured, semi-structured (JSON, Parquet), and unstructured data using tools like Hive, Delta Lake, and Apache Hudi for efficient data management. Strong command over relational and NoSQL databases including Oracle, MySQL, PostgreSQL, SQL Server, MongoDB, CouchDB, Amazon Redshift, and Snowflake, leveraging advanced SQL techniques, stored procedures, indexing strategies, and query optimization for large-scale data processing. Extensive experience across cloud platforms such as AWS (S3, Glue, Lambda, Athena, Redshift, Secrets Manager) and Microsoft Azure (Data Factory, Blob Storage, Synapse Analytics, Azure SQL), with secure integration using IAM, Vault, and Secrets Manager. Skilled in real-time streaming and event-driven architecture using Apache Kafka and Spark Structured Streaming, building low-latency data flows for business-critical applications.
Proficient in orchestrating workflows using Airflow, Rundeck, and Databricks Workflows, and integrating CI/CD pipelines through Jenkins, GitHub Actions, and Concourse, ensuring automation and seamless deployment across data engineering processes.
Highly collaborative in engaging with cross-functional stakeholders to gather requirements, analyze source data from various upstream systems, and translate complex business rules into scalable Spark/SQL logic.
Experienced in version control and code lifecycle management using Git, GitHub, Bitbucket, Passionate about data quality, security, and cost-efficient data architecture, consistently delivering reliable and performant data products that power enterprise reporting, analytics, and ML pipelines.