● Over 11+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer.
● Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS
● Setting up Azure infrastructure like storage accounts, integration runtime, service principal id, app registrations to enable scalable and optimized utilization of business user analytical requirements in Azure.
● Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
● Exploratory Data Analysis and Data Cleaning with Python.
● Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
● Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage and Composer.
● Experienced implementation of a log producer in Scala that watches for application logs, transforms incremental log and sends them to a Kafka and Zookeeper based log collection platform.
● Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice- versa and load into Hive tables, which are partitioned. Good knowledge in streaming applications using Apache Kafka.
● Experience in writing complex MapReduce jobs and Hive data modeling.
● Work experience with cloud infrastructure like Amazon Web Services (AWS) and AZURE.
● Expertise working with AWS cloud services like EMR, S3, Redshift, AWS Glue, EMR cloud watch, for big data development.
● Experience in fine-tuningMapReduce jobs for better scalability and performance and converting them to Spark.
● Experience in working with Spark RDD, Data Frames and Data Sets using different file formats like Json, Avro, Parquet and compression techniques.
● Worked extensively on enrichment/ETL in real time stream jobs using PySpark Streaming, Spark SQL and loads into HBase.
● Experienced in working with big data technologies like Spark Core, Spark SQL.
● Lambda functions for pre-processing data or post-processing Glue Job results.
● S3 events to trigger the Step Functions workflow when new data arrives in a specific S3 bucket.
● Implemented Step Functions to create new state machines and start the execution of Glue jobs.
● Orchestrated the workflow using AWS Step Functions to chain multiple AWS Services like Lambda, Glue in a defined sequence.
● Experience in working with Apache Flink and Supporting the jobs consuming Json data from Apache Pulsar.
● Experience in working with Apache Flink Data Streams and loading the data to PSQL.
● Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
● Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase
● Designed and executed Oozie workflows in a manner that allowed for scheduling Sqoop and Hive job actions to extract, transform and load data
● Migrate databases to cloud platform SQL Azure and as well the performance tuning.
● Experienced on Hadoop/Hive on AWS, using both EMR and non-EMR-Hadoop in EC2.
● Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
● Developed ETL/ELT pipelines using Apache Spark on Azure Databricks, including data cleaning, data enrichment, and data aggregation using Spark SQL and Spark DataFrames.
● Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
● Implemented sentiment analysis and text analytics on Twitter social media feeds and market news using Scala and Python.
● Provided production support for ETL and reporting systems, investigating and resolving issues, and maintaining system stability and availability. `
● Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
● Hands on experience in installing, configuring Cloudera Apache Hadoop ecosystem components like Flume, HBase, Zookeeper, Oozie, Hive, Sqoop and Pig.
● Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning pre-processing. and
● Worked with real-time data processing and streaming techniques using Spark streaming and Kafka
● Pipeline development skills with Apache Airflow, Kafka, and NiFi.
● Extensively using open-source languages Python, Scala and Java.
● Migrated projects from Cloudera Hadoop Hive storage to Azure Data Lake Store to satisfy Confidential transformation strategy
● Doing data synchronization between EC2 and S3, Hive stand-up, and AWS profiling.
● Using Spark Data frame API in Scala for analyzing data.