Summary

Overview

Work History

Education

Skills

Timeline

Raj Bharat G

Coppell,TX

Summary

• First and foremost, I am a results-driven Hadoop Administrator with 5 years of experience proficiently managing and optimizing Hadoop clusters.
• Proficient in installing, configuring, and maintaining Hadoop clusters in both on-premises and cloud environments.
• Skilled in monitoring cluster performance, conducting capacity planning, and troubleshooting issues to ensure optimal resource utilization and maximum uptime.
• Experienced in implementing security measures such as Kerberos authentication, SSL encryption, and role-based access control (RBAC) to safeguard data integrity.
• Proven ability to optimize cluster performance through performance tuning techniques, job scheduling, and fine-tuning of Hadoop configuration parameters.
• Proficient in managing Vertica, including installation, configuration, and maintenance of the Vertica database.
• Skilled in migrating data from Hadoop to Vertica, ensuring seamless transitions and minimal downtime.
• Expertise in optimizing Vertica database performance, fine-tuning queries, and enhancing overall system efficiency.
• Proficient in implementing backup strategies and recovery procedures for Vertica, ensuring data integrity and minimal data loss in case of failures.
• Experienced in implementing and managing security measures within Vertica to protect sensitive data.
• Proven ability to diagnose and resolve issues related to Vertica, ensuring continuous system functionality.
• Proficiency in scripting languages to automate routine tasks and enhance operational efficiency in Vertica.
• Strong skills in documenting Vertica configurations, procedures, and creating reports for performance analysis and system monitoring.
• Knowledgeable in implementing backup and recovery strategies to minimize downtime and ensure data integrity in the event of system failures.
• Accomplished in optimizing performance issues and adding complex business rules by creating re-usable transformations and robust Mappings.
• Experience in managing the Hadoop infrastructure with Cloudera Manager & Ambari.
• Strong Experience in Installation and configuration of Cloudera distribution Hadoop CDH 5.x., CDH 6.x and CDP 7.X

Overview

years of professional experience

Work History

Hadoop Platform Admin

Gainwell Technologies

Remote

04.2023 - Current

Built & Install Multi-Node Cloudera Clusters for Multiple Environment such as Production, Test, Dev & DR
Expert with Build & Migrate CDP 7.1.7 & CM 7.4.4 from CDH 6.x.
Expert with Build & Migrate CFM 2.2.1, NiFi 1.12, and 1.15.
Expert with Sentry to Ranger role migration.
Proficient in Cloudera Hadoop Administration, specifically with CDP (Cloudera Data Platform), including installation, migration, and ongoing management
Demonstrated success in integrating Hadoop clusters with Active Directory, enabling streamlined user authentication and access management
Extensive experience in data replication using Cloudera Manager, ensuring efficient data distribution and replication across Hadoop clusters
In-depth understanding and hands-on experience with kerborized Hadoop clusters, including integration with Active Directory for robust security measures
Proven track record in setting up and maintaining high availability configurations in Hadoop clusters, ensuring continuous and reliable operations
Skilled in performance tuning of Hadoop clusters, employing strategies to enhance system performance and optimize resource utilization
Proficiency in diagnosing and resolving issues within CDP clusters, leveraging troubleshooting skills to ensure cluster stability and performance
Strong Experience with upgrading CDH, CDP clusters & HDF, CDF Nifi Clusters
Expert in setting up & Installing CDF Nifi 1.12 and 1.14 on CDH 6.3.2 & CDH 6.3.3 Secured cluster
TSL integration setup for data in flow encryption and node to node communication for NIFI
Built numerous workflows to extract data from Oracle, Teradata, Db2, SQL server and loaded to Hadoop
Integrated NIFI & Cloudera, all Hadoop workloads are driven through NIFI
Integrated NIFI CDF with CDH cluster, to orchestrate hive, spark, Sqoop jobs
Integrated NIFI CDF with CDH cluster, to land data files to HDFS landing zone
Expert in capacity and resource planning to grow cluster size.
Environment: CDP PC Runtime 7.1.7, CM 7.4.4, Ranger, Ranger KMS, Ranger RMS, CFM 2.1.4SP1, CDH 5.4, CDH 5.9, CDH 5.15, CDH 5.16, CDH 6.3.2 & CDH 6.3.3 & CFM NIFI 1.12.1, NIFI 1.14, Hadoop, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Flume, Spark, Kafka, Shell Scripting, Cloudera Manager, Apache Nifi, Ambari, Hortonworks, Paxata, Navigator, Datalake, CONFLUENT Enterprise 5.4 KAFKA, KAFKA CONNECT, KSQL, Control Center, Confluent Cloud, AWS Cloud setup of Nifi, CDP

Hadoop & Nifi Administrator

Shutterfly

Remote

02.2020 - 04.2023

Deployed and managed multi-node development, testing, and production Hadoop clusters, including various Hadoop components (Hive, Pig, Sqoop, Oozie, HBase, Zookeeper), using Cloudera Manager
Installed, configured, and maintained Apache Hadoop clusters for application development, utilizing tools such as Hive, Pig, HBase, Zookeeper, and Sqoop.
Over four years of hands-on experience as a Kafka Administrator, demonstrating proficiency in configuring, managing, and optimizing Kafka clusters for efficient and reliable data streaming
Implemented robust, fault-tolerant strategies and disaster recovery plans, enhancing system resilience and minimizing downtime in critical scenarios.
Leveraged automation tools and scripting languages to streamline administrative tasks, increasing operational efficiency and reducing manual intervention
Proficient in setting up comprehensive monitoring systems for proactive issue detection, coupled with a strong troubleshooting skill set to quickly address and resolve any operational challenges.
Collaborated effectively with cross-functional teams and maintained detailed documentation, ensuring seamless knowledge transfer and compliance with best practices
Successfully executed Kafka version upgrades and data migration projects, demonstrating adaptability to new technologies and a commitment to staying current with industry trends.
Implemented robust security measures, including access controls and encryption, to safeguard Kafka environments and ensure data privacy and compliance with industry standards.
Provided training to team members and actively participated in knowledge-sharing sessions, contributing to the professional development of the team and fostering a collaborative work environment
Performed benchmarking and executed backup and recovery procedures for NameNode metadata and data residing in the cluster
Conducted minor and major upgrades, as well as data node commissioning and decommissioning, within the Hadoop cluster.
Led the installation and upgrade process from CDH4 to CDH5, including setting up the Oozie workflow engine for multiple Hive and Pig jobs.
Configured NameNode High Availability for improved cluster reliability.
Established 24x7 automated monitoring and escalation infrastructure using Nagios and Ganglia for comprehensive Hadoop cluster supervision
Analyzed existing Hadoop infrastructure, identified performance bottlenecks, and implemented performance tuning solutions.
Implemented rack-aware configuration to enhance data availability and processing efficiency.
Proficient in troubleshooting production-level issues within the Hadoop cluster and ensuring optimal functionality
Monitored cluster performance using tools like Ambari and Ganglia, identifying and resolving performance bottlenecks for optimized cluster operation
Implemented robust security measures, including Kerberos authentication and SSL encryption, to ensure data protection and compliance with industry standards
Provided on-call support and troubleshooting expertise, responding to critical issues promptly and ensuring minimal impact on production systems.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, SQL, Cloudera Manager, Sqoop, Flume, Oozie, HBase, Kafka, Drill, Spark & Streaming, Python Scripting.

Hadoop Developer

UnitedHealth Group

03.2017 - 06.2018

Analyzed big data sets with analytical tools such as R and Python libraries like SciPy and NumPy.
Ensured high availability of services running on the Hadoop cluster through frequent testing, maintenance, and upgrades.
Deployed Apache Spark applications on YARN clusters for distributed computing tasks.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase
Modify and Re-write existing ETL informatica jobs to Spark SQL
Implement SCD-II logic in the target warehouse to maintain the history
Utilized Spark in Memory capabilities, to handle large datasets
Worked as Airflow platform admin and monitored 600+ DAG’s on daily basis
Identify CDC using Power exchange and feed into target warehouse
Creating Hive tables, loading and analyzing data using hive scripts
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Involved in continuous Integration of application using Jenkins
Interacted with the infrastructure, network, database, application, and BA teams to ensure data quality and availability
Followed Agile Methodologies while working on the project
Environment: AWS, Scala, Hive, HDFS, Apache Spark, Apache Airflow, Oozie, Sqoop, Cassandra, Shell Scripting, Power BI, Mongo DB, Jenkins, UNIX, JIRA, Git.

AWS Data Engineer

Wipro

Hyderabad, Telangana

08.2015 - 02.2017

Built fault tolerant applications that leveraged multiple Availability Zones within an AWS region.
Executed migration strategies between different versions of HiveQL queries running over HDFS clusters hosted in EMR.
Developed and maintained data pipelines to ingest, store, process and analyze large datasets in AWS S3 buckets.
Enforced security policies through encryption at rest, in transit mechanisms provided by KMS service alongside IAM roles, policies.
Developed Spark applications on top of Hadoop clusters running on EMR for performing complex analytics operations.
Implemented automated monitoring of data flows using Cloudwatch and Lambda functions.
Built custom dashboards for visualizing data stored in Amazon Elasticsearch Service clusters.
Created ETL processes using Python scripts to move data from various sources into the target databases on AWS Redshift or RDS.
Optimized query performance by creating indexes and materialized views in Amazon Redshift clusters.

Education

Masters - Business Analytics

Sothern Arkansas Univeristy

08.2018

Bachelors - Computer Science

JNTU Hyderabad

11.2015

Skills

Technical Skills

Languages SQL, PL/SQL and UNIX shell & Python scripting

Operating systems UNIX/LINUX, Windows

Database MySQL, Postgres and Teradata

Tools TOAD, SQL

Navigator

Big Data Technologies Hadoop, HDFS, OOZIE, Impala, Sqoop, Hive, Kafka, Hue, Spark

Spark Streaming, Hortonworks Nifi, Ambari, Kafka

NoSQL Database: HBase

Implementing and managing security measures within Vertica
Amazon Web Services (S3, EC2, IAM, Route53, Databases, VPC, Lambda, EBS, EFS, Glue, Athena, SQS, SNS, API Gateway, Kinesis)
Working with AWS Databases (Elastic Cache, NoSQL databases)
Medallion/lakehouse architecture
Using DBT for transformations
Using Databricks as a datalake
Spark Architecture
MPP Architecture
Productionizing models in a Cloud environment
Visualization tools (Power BI, Microsoft Excel)

Automation tools and continuous integration workflows
Data Analysis, Data Profiling, Data Integration, Migration, Data governance, Metadata Management, Master Data Management, Configuration Management
Creating Docker containers
Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, Dynamo DB
Handling Python and Spark context for ETL
SQL, Python, R, Scala, UNIX Shell Script, Power Shell, YAML
Azure, AWS
WebLogic, Apache Tomcat
Horton Works, Cloudera Hadoop
HDFS, Hive, Sqoop, Yarn, Spark, Spark SQL
Azure Storage, Azure Data Factory, Azure Analysis Services, Azure Database, Map Reduce, AWS
Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow
Cloudera distribution and Hortonworks
AWS: Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB
Azure: Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight, GCP, OpenStack
DBT and Databricks
Informatica, Data Studio
Power BI, Tableau, SSRS
Citrix, VDI, VMware

Timeline

Hadoop Platform Admin

Gainwell Technologies

04.2023 - Current

Hadoop & Nifi Administrator

Shutterfly

02.2020 - 04.2023

Hadoop Developer

UnitedHealth Group

03.2017 - 06.2018

AWS Data Engineer

Wipro

08.2015 - 02.2017

Masters - Business Analytics

Sothern Arkansas Univeristy

Bachelors - Computer Science

JNTU Hyderabad

Raj Bharat G

Summary

Overview

Work History

Hadoop Platform Admin

Hadoop & Nifi Administrator

Hadoop Developer

AWS Data Engineer

Education

Masters - Business Analytics

Bachelors - Computer Science

Skills

Timeline

Hadoop Platform Admin

Hadoop & Nifi Administrator

Hadoop Developer

AWS Data Engineer

Masters - Business Analytics

Bachelors - Computer Science

Similar Profiles

Delora CrawfordDelora Crawford

Briana BurgherBriana Burgher

Ranjith kumar SRanjith kumar S

Erin LambertiErin Lamberti

Geoffrey WilliamsGeoffrey Williams