Over 9+ Years of Big Data experience in building highly scalable data analytics applications. Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka. Good hands-on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR. Good understanding of Distributed Systems architecture and design principles behind Parallel Computing. Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's. Experience working with Azure cloud and its services like Azure Cloud, Azure Data Factory (ADF), Azure Synapse Analytics, Snowflake, ETL/ELT, Azure Databricks. Experience working with Amazon Web Services (AWS) cloud and its services like EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services. Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc., Deep understanding of performance tuning, partitioning for building scalable data lakes. Worked on building real time data workflows using Kafka, Spark streaming and HBase. Extensive knowledge on NoSQL databases like HBase. Solid experience in working with csv, text, sequential parquet, orc, Json formats of data. Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts. Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data. Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc. Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).