Currently working as a Senior Data Engineer, I lead the design and development of end-to-end data pipelines on Databricks using PySpark, integrating a range of AWS services such as S3, Lambda, and Redshift, while orchestrating complex workflows with Airflow to ensure efficient, scalable data processing. I have successfully led cross-functional teams to implement advanced data solutions on Snowflake and AWS, significantly enhancing data accessibility and performance through optimized ELT workflows and robust data modeling practices. With extensive experience as a Data Engineer and Data Analyst, I specialize in building data pipelines using the Hadoop ecosystem, Spark, Hive, HDFS, MapReduce, YARN, Sqoop, Kafka, Oozie, and Teradata, while also leveraging cloud platforms like AWS and Google Cloud. I bring deep technical expertise in developing Spark applications using PySpark and Spark-SQL, creating Hive tables with custom UDFs, and utilizing visualization tools like Tableau and Amazon QuickSight. I’ve worked with cluster monitoring tools such as Cloudera Manager and Hortonworks and have hands-on experience with real-time data streaming via Kafka. My skill set includes advanced data manipulation using Partitions, Joins, and Window Functions, along with designing, testing, and maintaining data management systems using Spark, Hadoop, AWS, and Shell scripting. I am proficient in Python, Core Java, SQL, and Object-Oriented Design, with a strong background in creating stored procedures, triggers, and views for reliable data operations. Additionally, I work closely with business users, product owners, and engineering teams in Agile environments to deliver data-driven features, translating technical outcomes into actionable business insights and aligning data strategies with organizational goals.