Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Paramesh

Minneapolis,USA

Summary

A seasoned Sr Hadoop Engineer with a proven track record at Prime Therapeutics, I excel in deploying and managing Hadoop clusters, enhancing data security and governance with Apache Ranger and Atlas. Skilled in Ansible and Python, my expertise in automating and optimizing data operations significantly boosts system efficiency. Demonstrating strong problem-solving abilities and a commitment to continuous improvement, I consistently deliver high-quality solutions that meet and exceed employer expectations.

Overview

15
15
years of professional experience
1
1
Certification

Work History

Sr Hadoop Engineer

Prime Therapeutics
Minneapolis, USA
07.2019 - Current
  • Installing and configuring Hadoop clusters with Cloudera distribution of Hadoop
  • Version CDH 5.X to CDH 6.X
  • Upgrading cluster from CDH 6.X to CDP 7.1.6
  • CDH 6.X to CDP 7.1.6 to CDP 7.1.6 Migration on existing clusters without impacting to existing data
  • Experience in Configuring, Installing and Managing Apache Hadoop and Cloudera and Hortonworks Hadoop
  • Extensive experience with Installing New Servers and rebuilding existing Servers
  • Implemented and managed Apache Ranger policies within Cloudera Data Platform for fine-grained access control
  • Maintained Ansible as Configuration management tool to apply Hadoop config changes across clusters, revert them to previous versions replace the wrong components and etc
  • Build Hadoop server pre-check and pre-installs using Devops tools like Ansible
  • Monitor health of the platforms, introduce & implement structured granular reporting and tool framework, generate performance reports and KPI’s to maintain improvements
  • Integrate services like Ranger, Atlas, zeppelin with Active Directory
  • Creating ranger polices for HDFS, Hive, ATLAS, Kafka services
  • Installed and configured Apache Atlas to Meta tagging for future Attribute Based Access Control with Ranger, Data Lineage auditing and linking business taxonomies to meta data for organizing and visualizing data
  • Enabling HDFS encryption using CLI and Ranger
  • Configured and managed CDP Replication Manager to synchronize data seamlessly across clusters, enabling real-time or scheduled replication for business-critical datasets
  • Created policies and scheduled jobs in replication manager to replicate the HDFS/HIVE/HBase data between prod clusters
  • Integrated Apache Atlas classifications seamlessly with Apache Ranger tags, establishing a unified framework for data governance and access control
  • Designing and implementing fine-grained access control policies using Cloudera Ranger for Hadoop components such as HDFS, Hive, HBase, etc
  • Managing and enforcing authorization policies to control user access to data
  • Developing and managing security policies through the Cloudera Ranger administration interface
  • Customizing policies based on business requirements and compliance standards
  • Creating and maintaining documentation for configurations, best practices, and troubleshooting guides related to Cloudera Ranger and Apache Atlas
  • Designed, implemented, and maintained complex data workflows and pipelines using Apache Airflow to orchestrate tasks across various components within the Cloudera CDP ecosystem
  • Demonstrated expertise in integrating Apache Airflow with Cloudera components such as Hadoop Distributed File System (HDFS), Hive, Spark, and Impala for seamless data processing and analytics
  • Experienced across various platforms on different applications, and coordinate with related support and implementation teams and assist in customization of applications according to the business requirements of the client, work on incidents, problems and change requests
  • Utilized Airflow's parameterization features to make workflows configurable, allowing for easy adaptation to different environments and datasets within the Cloudera ecosystem
  • Integrated Apache Airflow with Cloudera CDP APIs for automation and management of Cloudera services, enhancing the overall efficiency of data operations
  • Improve system performance by conducting prep and stress tests to fine tune the services
  • Written scripts to backup Namenode metadata, MySQL dbs and configs with retention period as a part of disaster recovery process
  • Recognize and adopt best practices in data processing, reporting and analysis in terms of data integrity, test design, validations and documentations
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required
  • Point of Contact for Vendor escalation
  • Deployed and managed Kafka clusters on AWS MSK, configuring Kafka brokers, topics, partitions, and data retention policies based on application needs and ensuring seamless data flow
  • Used Terraform to automate the deployment of Kafka clusters on AWS MSK, creating reusable infrastructure-as-code templates to improve deployment efficiency and accuracy
  • Built data ingestion pipelines using Kafka Connect to integrate with external data sources such as Amazon S3 and HDFS, enabling continuous data streaming for downstream analytics applications
  • Monitored Kafka clusters using Prometheus and Grafana, identified performance bottlenecks, and fine-tuned Kafka configurations (batch size, compression type, broker settings) to improve throughput and latency
  • Conducted root cause analysis and troubleshooting of Kafka performance issues, coordinating with engineering and support teams to ensure timely resolution
  • Created comprehensive documentation on Kafka architecture, cluster management, troubleshooting steps, and best practices

Hadoop Administrator

Visa Inc
Austin, USA
09.2017 - 07.2019
  • Involved in Performance tuning at source, target, mappings, sessions, and system levels
  • Troubleshoot and resolve Hadoop cluster related system problems
  • Implemented Fair schedulers on the resource manager to share the resources of the Cluster for the Map Reduce jobs given by the users
  • Sqoop connection setup for exporting and importing data from DB2/HDFS
  • Good understanding of Partitioning concepts and different file formats supported in Hive
  • Involved in Cluster Capacity planning, deployment and Implementing POC
  • Integrated HDFS with Active Directory and Cluster security with Kerberos
  • Provided Support for production clusters
  • Performed cluster upgrades / migrations
  • Involved in upgrading the cluster from CDH 5.11 to 5.13
  • Built NON-Prod clusters using chef and actively participated in building production clusters
  • Performed various Maintenances in all Visa Hadoop clusters
  • Worked as SME for Production cluster
  • Worked with UNIX team on remediating qualys findings
  • Good understanding on fair/capacity schedulers
  • Additional responsibilities include interacting with offshore team on a daily basis, communicating the requirement, delegating the tasks to offshore/onsite team members and reviewing their delivery
  • Experience in using service now as a ticketing tool
  • Environment: Amazon Web Services, Amazon MSK, AWS EMR, HDFS, Map Reduce, YARN, Hive, HBase, Impala, Sqoop, Kafka, Spark and Kubernetes

Application Mgmt Sr. Advisor

Dell Inc.
Austin, USA
04.2016 - 06.2017
  • Experience in monitoring and managing the health of the data nodes
  • Performed capacity-planning analysis, monitored and controlled disk space usage on systems
  • Monitored system activities and fine-tuned system parameters and configurations to optimize job performance and ensure security of systems
  • Expertise in Hadoop cluster management like Adding and Removing Nodes without any effect to running jobs and data
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Resource manager, Node manager, Application master and containers
  • Experience with configuration of Hadoop Ecosystem components: Hadoop HDFS, Mapreduce, Hive, Impala, Kafka, Storm, Spark, Sqoop, Oozie, Hbase, Zookeeper and Flume
  • Involved in creating the AD group and databases in hive and hdfs paths for the database
  • Creating roles and granting access to the users as in need basis
  • Having an extra eye in monitoring the cluster thru Cloudera manager and Ambari
  • Working closely with developers to help them in trouble shooting the issues
  • Allocating the Hdfs space quota in labs environments for POC’s
  • Involved in preparing the process documentation of user and admin guide for Dell Data Reservoir
  • Implemented Commission / Decommission of new nodes to the existing cluster
  • Experienced in configuring dynamic resource pooling
  • Monitor health check of data nodes and fix the servers that are with bad hard drives
  • Manage and review Hadoop Log files
  • Used chef as configuration management tool
  • Created chef recipes for automating the infrastructure and deployments process
  • Managed nodes, run lists, roles, environments, data bags, cookbooks, recipes in chef
  • Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Cloudera Manager and Ambari
  • Experienced in using Cloudera Manager and Ambari an end-to-end tool manage Hadoop operations
  • Experience in configuring Kerberos security and connecting with Active Directory and manage Knox & Ranger configurations
  • Good knowledge on open-source configuration management tools Ansible
  • Experienced in using Apache Kafka like topics, Producers, consumers, brokers etc
  • Experienced in using Apache Storm, Nimbus, Supervisors, and Topologies etc
  • Successfully implemented shell scripts to pull the data from the Hive (Big data) to local file system
  • Making adjustments in number of maps reduce slots as per projects requirements
  • Setting up alerts to monitor cluster health so that the team can take necessary action
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, troubleshooting and Performance Tuning of the Hadoop Cluster
  • Worked on resolving production issues and documenting Root Cause Analysis and updating the tickets using HP service manager
  • Environment: Hortonworks 2.X, HDFS, MapReduce, YARN, Hive, HBASE, Zookeeper, Kafka, Storm, Spark

Hadoop Administrator

Dell Inc.
Austin, USA
01.2015 - 03.2016
  • Experience in monitoring and managing the health of the data nodes
  • Monitored system activities and fine-tuned system parameters and configurations to optimize job performance and ensure security of systems
  • Expertise in Hadoop cluster management like Adding and Removing Nodes without any effect to running jobs and data
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Resource manager, Node manager, Application master and containers
  • Experience with configuration of Hadoop Ecosystem components: Hadoop HDFS, Mapreduce, Hive, Impala, Kafka, Storm, Sqoop, Oozie, HBase, Zookeeper and Flume
  • Having an extra eye in monitoring the cluster thru Cloudera manager and Ambari
  • Working closely with developers to help them in trouble shooting the issues
  • Allocating the Hdfs space quota in labs environments for POC’s
  • Implemented Commission / Decommission of new nodes to the existing cluster
  • Experienced in configuring dynamic resource pooling
  • Monitor health check of data nodes and fix the servers that are with bad hard drives
  • Manage and review Hadoop Log files
  • Experienced in using Cloudera Manager and Ambari an end-to-end tool manage Hadoop operations
  • Experience in configuring Kerberos security and connecting with Active Directory and manage Knox & Ranger configurations
  • Successfully implemented shell scripts to pull the data from the Hive (Big data) to local file system
  • Making adjustments in number of maps reduce slots as per projects requirements
  • Setting up alerts to monitor cluster health so that the team can take necessary action
  • Environment: Hortonworks 2.X, HDFS, MapReduce, YARN, Hive, HBASE, Zookeeper, Kafka, Storm, Spark

Associate Consultant

HSBC GLTM
Kuala Lumpur, Malaysia
07.2013 - 09.2014
  • Design and Coding as per requirements
  • Review the coded programs
  • Coordinating Testing phase in the Unit and System
  • Preparing test scripts
  • Review of Unit and Integration test cases
  • Implemented Exception handling using custom exception
  • Addressing critical issues and fixing bugs, also involved in code reviews, design discussion
  • Preparing the estimates for the minor improvements and enhancement works
  • Environment: COBOL, JCL, DB2, REXX, Java Multithreading, JSP, VSAM, CICS, QMF, Expediter, SPUFI, FILE-AID, DB2 utilities

Software Engineer

Polaris Software Labs
Chennai, India
08.2011 - 06.2013
  • Company Overview: Client: Morgan Stanley Smith Barney
  • Involved in designing Use-case
  • Involved in designing and implementing
  • Involved in writing SQL queries
  • Analysis of the business functionality of the system
  • Design and Coding as per requirements
  • Coordinating Testing phase in the Unit and System
  • Review of Unit and Integration test cases
  • Member of Defect Prevention Group
  • Design and Coding as per client requirements
  • UTP preparation, Unit testing, System testing
  • Analyzing, Co-coordinating with plant-users for solving problem tickets
  • Solving Business related issues within the System
  • Offshore coordination with Onsite leads & SDMs
  • Communication with Visteon Business users and End users
  • Client: Morgan Stanley Smith Barney
  • Environment: COBOL, JCL, DB2, REXX, Java Multithreading, JSP, VSAM, CICS, QMF, Expediter, SPUFI, FILE-AID, Easytrieve TSO/ISPF, DB2 utilities

Systems Engineer

Polaris software labs
Hyderabad, India
02.2010 - 08.2011
  • Design and Coding as per requirements
  • Review the coded programs
  • Coordinating Testing phase in the Unit and System
  • Preparing test scripts
  • Review of Unit and Integration test cases
  • Implemented Exception handling using custom exception
  • Addressing critical issues and fixing bugs, also involved in code reviews, design discussion
  • Preparing the estimates for the minor improvements and enhancement works
  • Environment: COBOL, JCL, DB2, VSAM, CICS, QMF, Expediter, SPUFI, FILE-AID, DB2 utilities

Education

Master of Computer Science -

Anna University
12.2009

Skills

  • Apache Kafka
  • AWS MSK
  • AWS EMR
  • AWS CloudWatch
  • SNS
  • CloudTrail
  • HDFS
  • YARN
  • Hive
  • Sqoop
  • HBase
  • Zookeeper
  • Storm
  • Spark
  • Solr
  • Impala
  • Hue
  • Atlas
  • Superset
  • Ambari
  • Cloudera Manager
  • DB2
  • MySQL
  • VSAM
  • AWS CLI
  • Terraform
  • Ansible
  • Python
  • Shell Scripting
  • Kerberos
  • Sentry
  • Know
  • Ranger
  • Kafka Connect
  • Schema Registry
  • Docker
  • Kubernetes
  • Git

Certification

HDP Certified Administrator (HDPCA), http://bcert.me/sdskbprr

Timeline

Sr Hadoop Engineer

Prime Therapeutics
07.2019 - Current

Hadoop Administrator

Visa Inc
09.2017 - 07.2019

Application Mgmt Sr. Advisor

Dell Inc.
04.2016 - 06.2017

Hadoop Administrator

Dell Inc.
01.2015 - 03.2016

Associate Consultant

HSBC GLTM
07.2013 - 09.2014

Software Engineer

Polaris Software Labs
08.2011 - 06.2013

Systems Engineer

Polaris software labs
02.2010 - 08.2011

Master of Computer Science -

Anna University
Paramesh