Websites

Summary

Overview

Skills

Work History

Certification

Education

Timeline

Jhalak Das

Platform Engineer

S Richmond Hill,NY

Websites

Summary

Dynamic Platform Engineer with over 9 years of experience specializing in Hadoop and Big Data administration, complemented by extensive work on cloud and on-premise infrastructures across four prominent organizations. Expertise in installing, configuring, monitoring, and tuning CDP components like HDFS, YARN, Hive, Impala, and Spark to ensure optimal performance.

Skilled in implementing security measures, including Kerberos authentication, SSL encryption, and access control. Proficient in SQL, Linux, and Bash scripting for automation and operational efficiency.

Experienced in Infrastructure as Code (IaC) using Terraform and CloudFormation, streamlining deployment processes and enhancing infrastructure reliability. Demonstrated expertise in configuration management with Ansible and job scheduling using Autosys, ensuring efficient workflow management across environments.

Certified Kubernetes Administrator with extensive knowledge of containerization technologies, particularly Docker and Kubernetes, facilitating the deployment of scalable and resilient data applications. Additionally, as an AWS Certified Solutions Architect, I have hands-on experience with a wide range of AWS services, including S3, EC2, RDS, Lambda, VPC, and IAM, enabling seamless integration of cloud-based solutions.

Overview

years of professional experience

Certification

Skills

Cloudera, HDFS, YARN, MapReduce, Zookeeper, Kafka, Oozie, Spark, Hue, Impala, AutoSys
Oracle, PostgreSQL, RDS, Aurora, Hive, Redshift, HBase, DynamoDB
AWS and Azure
Docker, Kubernetes, ECS and EKS
Terraform, CloudFormation, Ansible

SSL/TLS, Kerberos, Ranger
SQL, HQL, Bash Scripting
Informatica, Glue and Sqoop
ITSM, Nexus, JIRA

Work History

Application Support Engineer

Bank of America

04.2022 - Current

• Monitor overall health of production Hadoop clusters (HDFS, YARN, Spark, Hive, HBase, etc.) using tools like Cloudera Manager, Autosys, or Genesis portal.
• Conduct thorough investigations of incidents and provide RCA for recurring issues to prevent future occurrences
• Support production ETL pipelines that ingest, transform, and store data in Hadoop using Informatica. Coordinate with stakeholders to plan and communicate any necessary production downtime for maintenance or upgrades
• Maintain up-to-date documentation for processes, configurations, and troubleshooting procedures
• Ensure production systems are designed for high availability and disaster recovery. Regularly test failover and recovery procedures
• Automate routine operational tasks such as job scheduling, monitoring, log analysis, and alerting using scripting languages like Python, Bash, or Ansible
• Work closely with DevOps, developers, and data engineers to resolve issues, deploy changes, and ensure system stability
• Provide timely updates to stakeholders regarding system health, incident status, and upcoming maintenance activities
• Environment: HDFS, YARN, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, MySQL, Sentry, Ranger, Kerberos and Informatica

Hadoop Administrator

Aetna

01.2020 - 02.2022

• Installed, configured, and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, HBase, Zookeeper and Sqoop
• Hand-on experience in administering large Cloudera Hadoop environments, build and support cluster set up, performance tuning and monitoring in an enterprise environment both on-premise and hybrid cloud environment.
• Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine - grained access to AWS resources to users
• Integrated CDH and CDP clusters with Active Directory and enabled Kerberos for Authentication
• Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive Jobs
• Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
• Collaborating with application teams to install operating system and Hadoop updates, version upgrades when required
• Automated workflows using shell scripts pull data from various databases into Hadoop
• Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Oozie, Impala, Cloudera, Oracle, Spark, Sqoop, MySQL, YARN, Sentry, Kerberos and ETL

Platform Engineer

Maybank

04.2017 - 12.2019

• Installed, configured, and maintained Hadoop clusters for application development and Hadoop tools like HDFS, YARN, Hive, HBase, Oozie, impala, hue, spark, Zookeeper, Sqoop, sentry etc
• Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration
• Used Sqoop to import and export data from HDFS to RDBMS and vice-versa
• Exported the analyzed data to the relational databases using Sqoop for data visualization, data load and generate reports
• Used Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and spark jobs that extract the data on a timely manner
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
• Automated workflows using shell scripts to pull data from various databases into Hadoop
• Environment: Cloudera CDH, HDFS, Map Reduce, YARN, Pig, Hive, Sqoop, Oozie, Zookeeper, Impala, Cloudera Manager, Cloudera Navigator, Kerberos, Apache Sentry, Talend, Oracle SQL Developer

Hadoop Administrator

Digi Key Electronics

09.2015 - 03.2017

Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files
Adding/Installation of new components and removal of them through Cloudera
Major and Minor upgrades and patch updates
Monitoring workload, job performance, capacity planning using Cloudera
Creating and managing the Cron jobs
Installed Oozie workflow engine to run multiple Hive jobs
Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open-source components like Hive and HBase
Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager
Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting, and managing Hadoop Clusters
Responsible for developing data pipeline using Sqoop and Spark store in HDFS
Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded

Certification

Certified Kubernetes Administrator (CKA)
AWS Certified Solutions Architect - Associate
Microsoft Certified Azure Administrator Associate

Education

Bachelor of Business Administration -

Leading University Sylhet

Timeline

Application Support Engineer

Bank of America

04.2022 - Current

Hadoop Administrator

Aetna

01.2020 - 02.2022

Platform Engineer

Maybank

04.2017 - 12.2019

Hadoop Administrator

Digi Key Electronics

09.2015 - 03.2017

Bachelor of Business Administration -

Leading University Sylhet

Jhalak Das

Websites

Summary

Overview

Skills

Work History

Application Support Engineer

Hadoop Administrator

Platform Engineer

Hadoop Administrator

Certification

Education

Bachelor of Business Administration -

Timeline

Application Support Engineer

Hadoop Administrator

Platform Engineer

Hadoop Administrator

Bachelor of Business Administration -

Similar Profiles

Luisa LopezLuisa Lopez

Maisha SpearsMaisha Spears

Gurbir SinghGurbir Singh

KaNyah BrownKaNyah Brown

KEVIN GAYADINKEVIN GAYADIN