Summary
Overview
Work History
Education
Skills
Timeline
Generic

Raj Bharat G

Coppell,TX

Summary

• First and foremost, I am a results-driven Hadoop Administrator with 5 years of experience proficiently managing and optimizing Hadoop clusters.
• Proficient in installing, configuring, and maintaining Hadoop clusters in both on-premises and cloud environments.
• Skilled in monitoring cluster performance, conducting capacity planning, and troubleshooting issues to ensure optimal resource utilization and maximum uptime.
• Experienced in implementing security measures such as Kerberos authentication, SSL encryption, and role-based access control (RBAC) to safeguard data integrity.
• Proven ability to optimize cluster performance through performance tuning techniques, job scheduling, and fine-tuning of Hadoop configuration parameters.
• Proficient in managing Vertica, including installation, configuration, and maintenance of the Vertica database.
• Skilled in migrating data from Hadoop to Vertica, ensuring seamless transitions and minimal downtime.
• Expertise in optimizing Vertica database performance, fine-tuning queries, and enhancing overall system efficiency.
• Proficient in implementing backup strategies and recovery procedures for Vertica, ensuring data integrity and minimal data loss in case of failures.
• Experienced in implementing and managing security measures within Vertica to protect sensitive data.
• Proven ability to diagnose and resolve issues related to Vertica, ensuring continuous system functionality.
• Proficiency in scripting languages to automate routine tasks and enhance operational efficiency in Vertica.
• Strong skills in documenting Vertica configurations, procedures, and creating reports for performance analysis and system monitoring.
• Knowledgeable in implementing backup and recovery strategies to minimize downtime and ensure data integrity in the event of system failures.
• Accomplished in optimizing performance issues and adding complex business rules by creating re-usable transformations and robust Mappings.
• Experience in managing the Hadoop infrastructure with Cloudera Manager & Ambari.
• Strong Experience in Installation and configuration of Cloudera distribution Hadoop CDH 5.x., CDH 6.x and CDP 7.X

Overview

9
9
years of professional experience

Work History

Hadoop Platform Admin

Gainwell Technologies
Remote
04.2023 - Current
  • Built & Install Multi-Node Cloudera Clusters for Multiple Environment such as Production, Test, Dev & DR
  • Expert with Build & Migrate CDP 7.1.7 & CM 7.4.4 from CDH 6.x.
  • Expert with Build & Migrate CFM 2.2.1, NiFi 1.12, and 1.15.
  • Expert with Sentry to Ranger role migration.
  • Proficient in Cloudera Hadoop Administration, specifically with CDP (Cloudera Data Platform), including installation, migration, and ongoing management
  • Demonstrated success in integrating Hadoop clusters with Active Directory, enabling streamlined user authentication and access management
  • Extensive experience in data replication using Cloudera Manager, ensuring efficient data distribution and replication across Hadoop clusters
  • In-depth understanding and hands-on experience with kerborized Hadoop clusters, including integration with Active Directory for robust security measures
  • Proven track record in setting up and maintaining high availability configurations in Hadoop clusters, ensuring continuous and reliable operations
  • Skilled in performance tuning of Hadoop clusters, employing strategies to enhance system performance and optimize resource utilization
  • Proficiency in diagnosing and resolving issues within CDP clusters, leveraging troubleshooting skills to ensure cluster stability and performance
  • Strong Experience with upgrading CDH, CDP clusters & HDF, CDF Nifi Clusters
  • Expert in setting up & Installing CDF Nifi 1.12 and 1.14 on CDH 6.3.2 & CDH 6.3.3 Secured cluster
  • TSL integration setup for data in flow encryption and node to node communication for NIFI
  • Built numerous workflows to extract data from Oracle, Teradata, Db2, SQL server and loaded to Hadoop
  • Integrated NIFI & Cloudera, all Hadoop workloads are driven through NIFI
  • Integrated NIFI CDF with CDH cluster, to orchestrate hive, spark, Sqoop jobs
  • Integrated NIFI CDF with CDH cluster, to land data files to HDFS landing zone
  • Expert in capacity and resource planning to grow cluster size.
  • Environment: CDP PC Runtime 7.1.7, CM 7.4.4, Ranger, Ranger KMS, Ranger RMS, CFM 2.1.4SP1, CDH 5.4, CDH 5.9, CDH 5.15, CDH 5.16, CDH 6.3.2 & CDH 6.3.3 & CFM NIFI 1.12.1, NIFI 1.14, Hadoop, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Flume, Spark, Kafka, Shell Scripting, Cloudera Manager, Apache Nifi, Ambari, Hortonworks, Paxata, Navigator, Datalake, CONFLUENT Enterprise 5.4 KAFKA, KAFKA CONNECT, KSQL, Control Center, Confluent Cloud, AWS Cloud setup of Nifi, CDP

Hadoop & Nifi Administrator

Shutterfly
Remote
02.2020 - 04.2023
  • Deployed and managed multi-node development, testing, and production Hadoop clusters, including various Hadoop components (Hive, Pig, Sqoop, Oozie, HBase, Zookeeper), using Cloudera Manager
  • Installed, configured, and maintained Apache Hadoop clusters for application development, utilizing tools such as Hive, Pig, HBase, Zookeeper, and Sqoop.
  • Over four years of hands-on experience as a Kafka Administrator, demonstrating proficiency in configuring, managing, and optimizing Kafka clusters for efficient and reliable data streaming
  • Implemented robust, fault-tolerant strategies and disaster recovery plans, enhancing system resilience and minimizing downtime in critical scenarios.
  • Leveraged automation tools and scripting languages to streamline administrative tasks, increasing operational efficiency and reducing manual intervention
  • Proficient in setting up comprehensive monitoring systems for proactive issue detection, coupled with a strong troubleshooting skill set to quickly address and resolve any operational challenges.
  • Collaborated effectively with cross-functional teams and maintained detailed documentation, ensuring seamless knowledge transfer and compliance with best practices
  • Successfully executed Kafka version upgrades and data migration projects, demonstrating adaptability to new technologies and a commitment to staying current with industry trends.
  • Implemented robust security measures, including access controls and encryption, to safeguard Kafka environments and ensure data privacy and compliance with industry standards.
  • Provided training to team members and actively participated in knowledge-sharing sessions, contributing to the professional development of the team and fostering a collaborative work environment
  • Performed benchmarking and executed backup and recovery procedures for NameNode metadata and data residing in the cluster
  • Conducted minor and major upgrades, as well as data node commissioning and decommissioning, within the Hadoop cluster.
  • Led the installation and upgrade process from CDH4 to CDH5, including setting up the Oozie workflow engine for multiple Hive and Pig jobs.
  • Configured NameNode High Availability for improved cluster reliability.
  • Established 24x7 automated monitoring and escalation infrastructure using Nagios and Ganglia for comprehensive Hadoop cluster supervision
  • Analyzed existing Hadoop infrastructure, identified performance bottlenecks, and implemented performance tuning solutions.
  • Implemented rack-aware configuration to enhance data availability and processing efficiency.
  • Proficient in troubleshooting production-level issues within the Hadoop cluster and ensuring optimal functionality
  • Monitored cluster performance using tools like Ambari and Ganglia, identifying and resolving performance bottlenecks for optimized cluster operation
  • Implemented robust security measures, including Kerberos authentication and SSL encryption, to ensure data protection and compliance with industry standards
  • Provided on-call support and troubleshooting expertise, responding to critical issues promptly and ensuring minimal impact on production systems.
  • Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, SQL, Cloudera Manager, Sqoop, Flume, Oozie, HBase, Kafka, Drill, Spark & Streaming, Python Scripting.

Hadoop Developer

UnitedHealth Group
03.2017 - 06.2018
  • Analyzed big data sets with analytical tools such as R and Python libraries like SciPy and NumPy.
  • Ensured high availability of services running on the Hadoop cluster through frequent testing, maintenance, and upgrades.
  • Deployed Apache Spark applications on YARN clusters for distributed computing tasks.
  • Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase
  • Modify and Re-write existing ETL informatica jobs to Spark SQL
  • Implement SCD-II logic in the target warehouse to maintain the history
  • Utilized Spark in Memory capabilities, to handle large datasets
  • Worked as Airflow platform admin and monitored 600+ DAG’s on daily basis
  • Identify CDC using Power exchange and feed into target warehouse
  • Creating Hive tables, loading and analyzing data using hive scripts
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Involved in continuous Integration of application using Jenkins
  • Interacted with the infrastructure, network, database, application, and BA teams to ensure data quality and availability
  • Followed Agile Methodologies while working on the project
  • Environment: AWS, Scala, Hive, HDFS, Apache Spark, Apache Airflow, Oozie, Sqoop, Cassandra, Shell Scripting, Power BI, Mongo DB, Jenkins, UNIX, JIRA, Git.

AWS Data Engineer

Wipro
Hyderabad, Telangana
08.2015 - 02.2017
  • Built fault tolerant applications that leveraged multiple Availability Zones within an AWS region.
  • Executed migration strategies between different versions of HiveQL queries running over HDFS clusters hosted in EMR.
  • Developed and maintained data pipelines to ingest, store, process and analyze large datasets in AWS S3 buckets.
  • Enforced security policies through encryption at rest, in transit mechanisms provided by KMS service alongside IAM roles, policies.
  • Developed Spark applications on top of Hadoop clusters running on EMR for performing complex analytics operations.
  • Implemented automated monitoring of data flows using Cloudwatch and Lambda functions.
  • Built custom dashboards for visualizing data stored in Amazon Elasticsearch Service clusters.
  • Created ETL processes using Python scripts to move data from various sources into the target databases on AWS Redshift or RDS.
  • Optimized query performance by creating indexes and materialized views in Amazon Redshift clusters.

Education

Masters - Business Analytics

Sothern Arkansas Univeristy
08.2018

Bachelors - Computer Science

JNTU Hyderabad
11.2015

Skills

Technical Skills

Languages SQL, PL/SQL and UNIX shell & Python scripting

Operating systems UNIX/LINUX, Windows

Database MySQL, Postgres and Teradata

Tools TOAD, SQL

Navigator

Big Data Technologies Hadoop, HDFS, OOZIE, Impala, Sqoop, Hive, Kafka, Hue, Spark

Spark Streaming, Hortonworks Nifi, Ambari, Kafka

NoSQL Database: HBase

  • Implementing and managing security measures within Vertica
  • Amazon Web Services (S3, EC2, IAM, Route53, Databases, VPC, Lambda, EBS, EFS, Glue, Athena, SQS, SNS, API Gateway, Kinesis)
  • Working with AWS Databases (Elastic Cache, NoSQL databases)
  • Medallion/lakehouse architecture
  • Using DBT for transformations
  • Using Databricks as a datalake
  • Spark Architecture
  • MPP Architecture
  • Productionizing models in a Cloud environment
  • Visualization tools (Power BI, Microsoft Excel)
  • Automation tools and continuous integration workflows
  • Data Analysis, Data Profiling, Data Integration, Migration, Data governance, Metadata Management, Master Data Management, Configuration Management
  • Creating Docker containers
  • Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, Dynamo DB
  • Handling Python and Spark context for ETL
  • SQL, Python, R, Scala, UNIX Shell Script, Power Shell, YAML
  • Azure, AWS
  • WebLogic, Apache Tomcat
  • Horton Works, Cloudera Hadoop
  • HDFS, Hive, Sqoop, Yarn, Spark, Spark SQL
  • Azure Storage, Azure Data Factory, Azure Analysis Services, Azure Database, Map Reduce, AWS
  • Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow
  • Cloudera distribution and Hortonworks
  • AWS: Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB
  • Azure: Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight, GCP, OpenStack
  • DBT and Databricks
  • Informatica, Data Studio
  • Power BI, Tableau, SSRS
  • Citrix, VDI, VMware

Timeline

Hadoop Platform Admin

Gainwell Technologies
04.2023 - Current

Hadoop & Nifi Administrator

Shutterfly
02.2020 - 04.2023

Hadoop Developer

UnitedHealth Group
03.2017 - 06.2018

AWS Data Engineer

Wipro
08.2015 - 02.2017

Masters - Business Analytics

Sothern Arkansas Univeristy

Bachelors - Computer Science

JNTU Hyderabad
Raj Bharat G