Dynamic Platform Engineer with over 9 years of experience specializing in Hadoop and Big Data administration, complemented by extensive work on cloud and on-premise infrastructures across four prominent organizations. Expertise in installing, configuring, monitoring, and tuning CDP components like HDFS, YARN, Hive, Impala, and Spark to ensure optimal performance.
Skilled in implementing security measures, including Kerberos authentication, SSL encryption, and access control. Proficient in SQL, Linux, and Bash scripting for automation and operational efficiency.
Experienced in Infrastructure as Code (IaC) using Terraform and CloudFormation, streamlining deployment processes and enhancing infrastructure reliability. Demonstrated expertise in configuration management with Ansible and job scheduling using Autosys, ensuring efficient workflow management across environments.
Certified Kubernetes Administrator with extensive knowledge of containerization technologies, particularly Docker and Kubernetes, facilitating the deployment of scalable and resilient data applications. Additionally, as an AWS Certified Solutions Architect, I have hands-on experience with a wide range of AWS services, including S3, EC2, RDS, Lambda, VPC, and IAM, enabling seamless integration of cloud-based solutions.
• Monitor overall health of production Hadoop clusters (HDFS, YARN, Spark, Hive, HBase, etc.) using tools like Cloudera Manager, Autosys, or Genesis portal.
• Conduct thorough investigations of incidents and provide RCA for recurring issues to prevent future occurrences
• Support production ETL pipelines that ingest, transform, and store data in Hadoop using Informatica. Coordinate with stakeholders to plan and communicate any necessary production downtime for maintenance or upgrades
• Maintain up-to-date documentation for processes, configurations, and troubleshooting procedures
• Ensure production systems are designed for high availability and disaster recovery. Regularly test failover and recovery procedures
• Automate routine operational tasks such as job scheduling, monitoring, log analysis, and alerting using scripting languages like Python, Bash, or Ansible
• Work closely with DevOps, developers, and data engineers to resolve issues, deploy changes, and ensure system stability
• Provide timely updates to stakeholders regarding system health, incident status, and upcoming maintenance activities
• Environment: HDFS, YARN, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, MySQL, Sentry, Ranger, Kerberos and Informatica
• Installed, configured, and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, HBase, Zookeeper and Sqoop
• Hand-on experience in administering large Cloudera Hadoop environments, build and support cluster set up, performance tuning and monitoring in an enterprise environment both on-premise and hybrid cloud environment.
• Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine - grained access to AWS resources to users
• Integrated CDH and CDP clusters with Active Directory and enabled Kerberos for Authentication
• Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive Jobs
• Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
• Collaborating with application teams to install operating system and Hadoop updates, version upgrades when required
• Automated workflows using shell scripts pull data from various databases into Hadoop
• Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, Sqoop, MySQL, YARN, Sentry, Kerberos and ETL
• Installed, configured, and maintained Hadoop clusters for application development and Hadoop tools like HDFS, YARN, Hive, HBase, Oozie, impala, hue, spark, Zookeeper, Sqoop, sentry etc
• Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration
• Used Sqoop to import and export data from HDFS to RDBMS and vice-versa
• Exported the analyzed data to the relational databases using Sqoop for data visualization, data load and generate reports
• Used Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and spark jobs that extract the data on a timely manner
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
• Automated workflows using shell scripts to pull data from various databases into Hadoop
• Environment: Cloudera CDH, HDFS, Map Reduce, YARN, Pig, Hive, Sqoop, Oozie, Zookeeper, Impala, Cloudera Manager, Cloudera Navigator, Kerberos, Apache Sentry, Talend, Oracle SQL Developer