Around 5 years of extensive experience in Information Technology with expertise on Data Analytics, Data Design, Development, Implementation, Testing and Deployment of Software Applications in Finance, Insurance, domains.
Working experience on designing and implementation complete end to end Hadoop infrastructure using HDFS, MapReduce, Hive, HBase, Kafka, Sqoop, Spark, No SQL, Postman, and Python
Created Data Frames and performed analysis using Spark SQL.
Acute knowledge on Spark Streaming and Spark Machine Learning Libraries.
Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS Redshift, Lambda and Amazon EC2, Amazon EMR.
Performed transformations on the imported data and exported back to RDBMS.
Worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake.
Experience in writing queries in HQL (Hive Query Language), to perform data analysis.
Created Hive External and Managed Tables.
Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization.
Integrated Flume with Kafka, using Flume both as a producer and consumer (concept of FLAFKA).
Good Exposure to create various dashboard in Reporting Tools like SAS, Tableau, Power BI, BO, QlikView used various filters, sets while dealing with huge volume of data.
Experience in various Database such as Oracle, Teradata, Informix and DB2.
Experience with NoSQL like MongoDB, HBase and PostgreSQL like Greenplum