8+ years of strong experience in Application Development using Pyspark,Java, Python, Scala and R & in depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks. Strong experience using pyspark,HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase. Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance. Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features. Experience in developing Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs. Worked with real-time data processing and streaming techniques using Spark streaming and Kafka. Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop. Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries. Significant experience writing custom UDFs in Hive and custom Input Formats in MapReduce. Involved in creating Hive tables, loading with data and writing Hive ad-hoc queries that will run internally in MapReduce and TEZ, Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing, Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data. Strong understanding of real time streaming technologies Spark and Kafka. Knowledge of job work flow management and coordinating tools like Oozie. Strong experience building end to end data pipelines on Hadoop platform. Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase. Strong understanding of Logical and Physical database models and entity-relationship modeling. Experience with Software development tools such as JIRA, Play, GIT. Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. Strong understanding of Java Virtual Machines and multi-threading process. Experience in writing complex SQL queries, creating reports and dashboards. Proficient in using Unix based Command Line Interface, Expertise in handling ETL tools like Informatica. Excellent analytical, communication and interpersonal skills. Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).