Results-oriented IT professional with extensive experience in all phases of the Software Development Life Cycle (SDLC). Skilled in data analysis, design, development, testing, and deployment of software systems. Strong expertise in the Hadoop ecosystem, including HDFS, Spark, MapReduce, Hive, Pig, YARN, Oozie, Sqoop, Flume, Kafka, and NoSQL databases like HBase and Cassandra. Proven ability to leverage Spark and Scala APIs for efficient data processing and comparison with Hive and SQL. Well-versed in data migration projects and proficient in both on-premises and cloud environments. Adept at creating and orchestrating data pipelines using Oozie and Airflow. Experienced in using cloud services such as Amazon EMR, S3, EC2, Redshift, Athena, Google GCS, Dataproc clusters, Airflow, BigQuery, and Logging. Solid understanding of distributed systems design, HDFS architecture, MapReduce, and Spark processing frameworks. Skilled in developing highly scalable data transformations using Spark RDDs, DataFrames, Spark SQL, and Spark Streaming. Proficient in troubleshooting Spark failures and optimizing long-running Spark applications. Excellent knowledge of SQL and proficiency in working with various databases including Oracle, MySQL, and Teradata. Strong team player with exceptional communication, analytical, presentation, and interpersonal skills. Proficient in Core Java concepts and experienced in using project management tools like JIRA, source code management with GIT, continuous integration with Jenkins, and code reviews with Crucible.