
Over 15+ years of overall experience as an IT developer including 5+ years as a Big Data/Hadoop Developer. Good knowledge of Hadoop Distributed File System and Ecosystem components like SPARK, MapReduce, HIVE, PIG, HBase, Sqoop, Oozie, Storm, Zookeeper and Flume. Detailed understanding of Hadoop internal architecture and functionality of various components such as Job Tracker, Task Tracker, Name Node & Data Node, Application Master, Resource Manager, Node Manager & MapReduce programming paradigm. Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase. Used CQL to retrieve the data from Cassandra DB. Experience in Hive query language for data analytics and loading data into Hive partitions and bucketing. Experience in Cloudera and Horton Works distribution also Cloudera manager to manage and monitor Hadoop cluster. Used Spark streaming to divide streaming data into batches as an input to Spark engine for batch processing. Implemented Spark Scripts using Scala, Spark SQL to access Hive tables into Spark for faster processing of data. Experienced in performance tuning of Spark Applications for setting right Batch Interval time correct level of Parallelism and memory tuning. Developed a Pig Latin scripts for transformations and using Hive Query Language for data analytics. Experienced in importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS and vice-versa using Sqoop. Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet. Have experience in Shell Scripting like Scala/Python scripting languages and used it extensively with Spark for data processing. Hands on experience with batch processing of data sources using Apache Spark. Implemented Spark RDD transformations actions to implement business analysis. Used Flume to collect aggregate and store the web log data onto HDFS. Used Zookeeper for various types of centralized configurations. Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables. Experience in processing of load and transform the large data sets of structured, unstructured and semi structured data. Utilizing Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Python/PySpark also Scala and databases such as HBase Imported and extracted the needed data using Sqoop from the server into HDFS and Bulk Loaded the cleaned data into HBase using MapReduce. Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor. Very Good understanding of SQL, ETL and Data Warehousing Technologies. Designing and creating ETL jobs through Talend to load huge volumes of data in Hadoop Ecosystem and relational databases. Developed a numerous application using Java, J2EE, JSP, SPRING, Hibernate, XML, HTML, PL/SQL, JavaScript and jQuery. Experience database development skills using SQL/PLSQL for various relational Databases like Oracle, Sybase, Postgress SQL, SQL server and NOSQL databases like MongoDB. Developed a website using RESTful APIs to fetch data from the web server. Java developer with extensive experience on various Java Libraries, API's, front end, back end and frameworks. Worked on Log4J package for logging purposes and CVS, Sub Version for the version control. Strong ability to understand new concepts and applications. Excellent Verbal and Written Communication Skills have proven to be highly effective in interfacing across business and technical groups. Results-driven Lead Technologist with extensive experience in AWS, Big Data Analytics, and ETL Development. Proven ability to lead complex projects and implement innovative big data solutions.