Seasoned data engineer with 5 years of experience specializing in designing, building, and optimizing scalable data pipelines using cutting-edge technologies such as Hadoop, Spark, Kafka, Airflow, and Hive. Skilled in Python, PySpark, and Scala to develop robust ETL workflows and implement real-time data processing solutions using Spark Streaming and Kafka. Proficient in cloud platforms like AWS, Azure, and GCP, leveraging services such as S3, Redshift, Glue, ADF, Synapse, and Snowflake to architect scalable and cost-effective data solutions. Extensive background in SQL and NoSQL databases (PostgreSQL, Oracle, DynamoDB, Cassandra) with advanced query optimization and data modeling for efficient storage and retrieval. Hands-on experience with CI/CD pipelines, infrastructure automation using Terraform, and workflow orchestration with Airflow to ensure reliable data processing. Adept at developing interactive dashboards using Power BI to integrate business intelligence solutions for real-time insights and data-driven decision-making. Proficient in optimizing big data frameworks (Hadoop, Spark) for improved performance, scalability, and cost efficiency to enable faster data processing and reduce operational costs. Skilled in implementing data integration solutions using Apache Kafka to ensure seamless data flow and real-time stream processing for mission-critical applications. Strong expertise in automating data pipeline testing and monitoring with tools like Apache Airflow and custom Python scripts to ensure data integrity throughout the ETL process. Experience in implementing data versioning and lineage tracking using tools like Apache Atlas and Glue Data Catalog to ensure transparent and traceable data pipelines for compliance and auditing purposes.