Summary

Overview

Work History

Education

Skills

Timeline

MILTON HASAN

Big Data Developer

NEWYORK,NY

Summary

Over 6 years of professional experience in Data Modeling, Design and Development of Big Data technologies in depth understanding of Hadoop Distributed Architecture and its various components such as Node Manager, Resource Manager, Name Node, Data Node, Hive Server2, HBase Master, Region Server etc. Strong Proficiency in developing, deploying, and debugging cloud-based applications using AWS. Strong experience creating real time data streaming solutions using Spark streaming and Kafka. understanding of cloud-native applications to write code using AWS security using like I Am Roles & etc. Strong Knowledge for AWS Data Migration using DMS, Kinesis, Lambda, EMR, and Athena. Experience of pySpark on Azure Databricks for data cleaning, manipulation and optimization on Business needs. Expertise in writing end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive. Experience in Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc. Experience on insurance and banking domain and also integrating applications like zeppelin and Docker. Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs. Experience in AWS service APIs, AWS CLI, and SDKs to write applications. Strong experience in working with UNIX/LINUX environments, writing shell scripts. Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns. Working knowledge of Azure data Factory and data processing solution through Azure data bricks. Good working experience in design and application development using IDE's like IntelliJ, Eclipse. Understanding of core AWS services, uses, and basic AWS architecture best knowledge of practices. Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile. Ability to blend Claude service expertise with strong Aws skills to create and configure Ec2 instance and connect with Aws Redshift find out faster and deploy the query method and also using Aws s3 as a storage, also very comfortable with Aws RDS, Aws Athena, Aws Glue, Aws I am role for security, Aws lambda and also Aws step functions.

Overview

2025

years of professional experience

Work History

BIG DATA SPARK DEVELOPER

Liberty Mutual Insurance

Roanoke, VA

1 2021 - Current

Responsible for building scalable distributed data solutions using Hadoop
Design the best approach suited for streaming data movement from different sources to HDFS using Apache Kafka
Building the custom connectors using the Kafka core concepts with API and Rest Proxy
Worked a basic understanding of AWS cloud-native applications to write code and its serverless application
Using Aws knowledge Lambda with Step Functions and Aws Redshift and EMR also dealing with AWS Security like IAM roles in use case
Importing and exporting data into HDFS and Hive using Sqoop and Migration of huge amounts of data from different databases to Hadoop
Used Hive and Spark SQL for analyzing the data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region details
Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations
Used Scalable language as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS
Worked closely data scientist for building predictive model using Spark
Load and transform large sets of structured and semi structured data
Responsible to manage data coming from different sources
Developed data pipeline using Sqoop and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
Involved in to use of AWS service APIs, AWS CLI, and SDKs to write applications
Understanding of the AWS shared responsibility model and its lifecycle management application
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability
Developed multiple Map reduce jobs in Hive for data cleaning and pre-processing
Involved in defining job flows
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Managed and reviewed the Hadoop Log files
Create stubs for Kafka producers, consumers and consumer groups for helping onboard applications from different languages/platforms
Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other demon’s in the Hadoop ecosystem
Developed complex hive queries using Joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data
Performed Bucketing and Partitioning of data using Apache Hive which saves the processing time and generating proper sample insights
Create Kafka topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of Kafka increase performance, high availability and stability of solutions
Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work
Imported and exported data from different RDBMS systems such as Oracle, Teradata, Sql Server, Netezza and Linux systems such as Sas Grid
Handled semi-structured data such as excel, csv and imported from sas grid to hdfs by using sftp process
Ingested data into hive tables, using Sqoop and sftp process
Have knowledge on Elastic search to identify the Kafka message failure scenarios and reprocess the failure messages in Kafka using offset id
Data level transformations have been done in intermediate tables before forming final tables
Data Integrity checks have been handled using hive queries, Hadoop and Spark
Daily, Monthly, Quarterly and ad hoc based data loads automated in Control -M and will run as per calendar dates scheduled
Involved in Production Support, BAU Activities and Release management
Expertise in writing custom UDFs in Hive
Environment: Hadoop, Kafka, Spark, Hive, Shell, Sqoop, Oozie Workflows, Teradata, Netezza, Sql Server, Aws, Oracle, Hue, Impala, Cloudera Manager

HADOOP DEVELOPER

PepsiCo

Winston-Salem, North Carolina

04.2019 - 12.2020

Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop
Performance tune and manage growth of the O/S, disk usage, and network traffic
Experienced in Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper
Analyze latest Big Data Analytic Cloud technologies like most of the AWS services
Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data
Created and worked Sqoop jobs with incremental load to populate Hive External tables
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop
Understanding of core AWS services, uses, and basic AWS architecture and AWS service APIs, AWS CLI, and SDKs to write applications
Implemented Hive Generic UDF's to in corporate business logic into Hive Queries
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Creating Hive tables and working on them using Hive QL
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
Worked on Cluster co-ordination services through Zookeeper
Monitored workload, job performance and capacity planning using Cloudera Manager
Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team
Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources
Involved in Agile methodologies, daily scrum meetings, spring planning
Environment: Hadoop, Kafka, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Aws, Hortonworks, Cloudera Manager, Apache Yarn, Python

BIG DATA SPECIALIST

BMO Harries Bank

New York, New York

02.2018 - 03.2019

Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
Installed application on AWS EC2 instances and configured the storage on S3 buckets
Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch
Worked in AWS environment for development and deployment of Custom Hadoop Applications
Worked closely with the data modelers to model the new incoming data sets
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs)
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, Oozie, Zookeeper, Sqoop, Spark, Impala, Cassandra with Horton work Distribution
Involved in creating Hive tables to loading data and writing hive queries
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
Configured deployed and maintained multi-node Dev and Test Kafka Clusters
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
Import the data from different sources like HDFS/HBase or local into Spark RDD
Developed a data pipeline using Kafka and Storm to store data into HDFS
Performed real time analysis on the incoming data
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Implemented Spark using Scala and SparkSQL for faster testing and processing of data
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oozie, Scala, Spark, Aws Linux

Education

Bachelor of Science -

Stamford University

Skills

Python and SQL

Data Engineering

ETL processes

Performance Tuning

Linux Environment

Big Data Processing

Hadoop Ecosystem Knowledge

Streaming data processing

Databricks platform

Agile Methodologies

Timeline

HADOOP DEVELOPER

PepsiCo

04.2019 - 12.2020

BIG DATA SPECIALIST

BMO Harries Bank

02.2018 - 03.2019

BIG DATA SPARK DEVELOPER

Liberty Mutual Insurance

1 2021 - Current

Bachelor of Science -

Stamford University

MILTON HASAN

Summary

Overview

Work History

BIG DATA SPARK DEVELOPER

HADOOP DEVELOPER

BIG DATA SPECIALIST

Education

Bachelor of Science -

Skills

Timeline

HADOOP DEVELOPER

BIG DATA SPECIALIST

BIG DATA SPARK DEVELOPER

Bachelor of Science -

Similar Profiles

Patrick KellyPatrick Kelly

Luz Guzman RomanoLuz Guzman Romano

Grant MeierGrant Meier

Austen FosterAusten Foster

Sam KuperSam Kuper