Websites
Summary
Overview
Skills
Work History
Certification
Education
Timeline
Generic
Jhalak Das

Jhalak Das

Platform Engineer
S Richmond Hill,NY

Summary

Dynamic Platform Engineer with over 9 years of experience specializing in Hadoop and Big Data administration, complemented by extensive work on cloud and on-premise infrastructures across four prominent organizations. Expertise in installing, configuring, monitoring, and tuning CDP components like HDFS, YARN, Hive, Impala, and Spark to ensure optimal performance.

Skilled in implementing security measures, including Kerberos authentication, SSL encryption, and access control. Proficient in SQL, Linux, and Bash scripting for automation and operational efficiency.

Experienced in Infrastructure as Code (IaC) using Terraform and CloudFormation, streamlining deployment processes and enhancing infrastructure reliability. Demonstrated expertise in configuration management with Ansible and job scheduling using Autosys, ensuring efficient workflow management across environments.

Certified Kubernetes Administrator with extensive knowledge of containerization technologies, particularly Docker and Kubernetes, facilitating the deployment of scalable and resilient data applications. Additionally, as an AWS Certified Solutions Architect, I have hands-on experience with a wide range of AWS services, including S3, EC2, RDS, Lambda, VPC, and IAM, enabling seamless integration of cloud-based solutions.

Overview

9
9
years of professional experience
2
2
Certification

Skills

  • Cloudera, HDFS, YARN, MapReduce, Zookeeper, Kafka, Oozie, Spark, Hue, Impala, AutoSys
  • Oracle, PostgreSQL, RDS, Aurora, Hive, Redshift, HBase, DynamoDB
  • AWS and Azure
  • Docker, Kubernetes, ECS and EKS
  • Terraform, CloudFormation, Ansible
  • SSL/TLS, Kerberos, Ranger
  • SQL, HQL, Bash Scripting
  • Informatica, Glue and Sqoop
  • ITSM, Nexus, JIRA

Work History

Application Support Engineer

Bank of America
04.2022 - Current

• Monitor overall health of production Hadoop clusters (HDFS, YARN, Spark, Hive, HBase, etc.) using tools like Cloudera Manager, Autosys, or Genesis portal.
• Conduct thorough investigations of incidents and provide RCA for recurring issues to prevent future occurrences
• Support production ETL pipelines that ingest, transform, and store data in Hadoop using Informatica. Coordinate with stakeholders to plan and communicate any necessary production downtime for maintenance or upgrades
• Maintain up-to-date documentation for processes, configurations, and troubleshooting procedures
• Ensure production systems are designed for high availability and disaster recovery. Regularly test failover and recovery procedures
• Automate routine operational tasks such as job scheduling, monitoring, log analysis, and alerting using scripting languages like Python, Bash, or Ansible
• Work closely with DevOps, developers, and data engineers to resolve issues, deploy changes, and ensure system stability
• Provide timely updates to stakeholders regarding system health, incident status, and upcoming maintenance activities
• Environment: HDFS, YARN, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, MySQL, Sentry, Ranger, Kerberos and Informatica

Hadoop Administrator

Aetna
01.2020 - 02.2022

• Installed, configured, and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, HBase, Zookeeper and Sqoop
• Hand-on experience in administering large Cloudera Hadoop environments, build and support cluster set up, performance tuning and monitoring in an enterprise environment both on-premise and hybrid cloud environment.
• Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine - grained access to AWS resources to users
• Integrated CDH and CDP clusters with Active Directory and enabled Kerberos for Authentication
• Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive Jobs
• Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
• Collaborating with application teams to install operating system and Hadoop updates, version upgrades when required
• Automated workflows using shell scripts pull data from various databases into Hadoop
• Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, Sqoop, MySQL, YARN, Sentry, Kerberos and ETL

Platform Engineer

Maybank
04.2017 - 12.2019

• Installed, configured, and maintained Hadoop clusters for application development and Hadoop tools like HDFS, YARN, Hive, HBase, Oozie, impala, hue, spark, Zookeeper, Sqoop, sentry etc
• Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration
• Used Sqoop to import and export data from HDFS to RDBMS and vice-versa
• Exported the analyzed data to the relational databases using Sqoop for data visualization, data load and generate reports
• Used Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and spark jobs that extract the data on a timely manner
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
• Automated workflows using shell scripts to pull data from various databases into Hadoop
• Environment: Cloudera CDH, HDFS, Map Reduce, YARN, Pig, Hive, Sqoop, Oozie, Zookeeper, Impala, Cloudera Manager, Cloudera Navigator, Kerberos, Apache Sentry, Talend, Oracle SQL Developer

Hadoop Administrator

Digi Key Electronics
09.2015 - 03.2017
  • Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files
  • Adding/Installation of new components and removal of them through Cloudera
  • Major and Minor upgrades and patch updates
  • Monitoring workload, job performance, capacity planning using Cloudera
  • Creating and managing the Cron jobs
  • Installed Oozie workflow engine to run multiple Hive jobs
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open-source components like Hive and HBase
  • Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting, and managing Hadoop Clusters
  • Responsible for developing data pipeline using Sqoop and Spark store in HDFS
  • Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded

Certification

  • Certified Kubernetes Administrator (CKA)
  • AWS Certified Solutions Architect - Associate
  • Microsoft Certified Azure Administrator Associate

Education

Bachelor of Business Administration -

Leading University Sylhet

Timeline

Application Support Engineer

Bank of America
04.2022 - Current

Hadoop Administrator

Aetna
01.2020 - 02.2022

Platform Engineer

Maybank
04.2017 - 12.2019

Hadoop Administrator

Digi Key Electronics
09.2015 - 03.2017

Bachelor of Business Administration -

Leading University Sylhet
Jhalak DasPlatform Engineer