Summary
Overview
Work History
Education
Skills
Timeline
Generic

MILTON HASAN

Big Data Developer
NEWYORK,NY

Summary

Over 6 years of professional experience in Data Modeling, Design and Development of Big Data technologies in depth understanding of Hadoop Distributed Architecture and its various components such as Node Manager, Resource Manager, Name Node, Data Node, Hive Server2, HBase Master, Region Server etc. Strong Proficiency in developing, deploying, and debugging cloud-based applications using AWS. Strong experience creating real time data streaming solutions using Spark streaming and Kafka. understanding of cloud-native applications to write code using AWS security using like I Am Roles & etc. Strong Knowledge for AWS Data Migration using DMS, Kinesis, Lambda, EMR, and Athena. Experience of pySpark on Azure Databricks for data cleaning, manipulation and optimization on Business needs. Expertise in writing end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive. Experience in Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc. Experience on insurance and banking domain and also integrating applications like zeppelin and Docker. Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs. Experience in AWS service APIs, AWS CLI, and SDKs to write applications. Strong experience in working with UNIX/LINUX environments, writing shell scripts. Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns. Working knowledge of Azure data Factory and data processing solution through Azure data bricks. Good working experience in design and application development using IDE's like IntelliJ, Eclipse. Understanding of core AWS services, uses, and basic AWS architecture best knowledge of practices. Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile. Ability to blend Claude service expertise with strong Aws skills to create and configure Ec2 instance and connect with Aws Redshift find out faster and deploy the query method and also using Aws s3 as a storage, also very comfortable with Aws RDS, Aws Athena, Aws Glue, Aws I am role for security, Aws lambda and also Aws step functions.

Overview

2025
2025
years of professional experience

Work History

BIG DATA SPARK DEVELOPER

Liberty Mutual Insurance
Roanoke, VA
1 2021 - Current
  • Responsible for building scalable distributed data solutions using Hadoop
  • Design the best approach suited for streaming data movement from different sources to HDFS using Apache Kafka
  • Building the custom connectors using the Kafka core concepts with API and Rest Proxy
  • Worked a basic understanding of AWS cloud-native applications to write code and its serverless application
  • Using Aws knowledge Lambda with Step Functions and Aws Redshift and EMR also dealing with AWS Security like IAM roles in use case
  • Importing and exporting data into HDFS and Hive using Sqoop and Migration of huge amounts of data from different databases to Hadoop
  • Used Hive and Spark SQL for analyzing the data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region details
  • Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations
  • Used Scalable language as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS
  • Worked closely data scientist for building predictive model using Spark
  • Load and transform large sets of structured and semi structured data
  • Responsible to manage data coming from different sources
  • Developed data pipeline using Sqoop and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Involved in to use of AWS service APIs, AWS CLI, and SDKs to write applications
  • Understanding of the AWS shared responsibility model and its lifecycle management application
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
  • Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability
  • Developed multiple Map reduce jobs in Hive for data cleaning and pre-processing
  • Involved in defining job flows
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Managed and reviewed the Hadoop Log files
  • Create stubs for Kafka producers, consumers and consumer groups for helping onboard applications from different languages/platforms
  • Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other demon’s in the Hadoop ecosystem
  • Developed complex hive queries using Joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data
  • Performed Bucketing and Partitioning of data using Apache Hive which saves the processing time and generating proper sample insights
  • Create Kafka topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of Kafka increase performance, high availability and stability of solutions
  • Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work
  • Imported and exported data from different RDBMS systems such as Oracle, Teradata, Sql Server, Netezza and Linux systems such as Sas Grid
  • Handled semi-structured data such as excel, csv and imported from sas grid to hdfs by using sftp process
  • Ingested data into hive tables, using Sqoop and sftp process
  • Have knowledge on Elastic search to identify the Kafka message failure scenarios and reprocess the failure messages in Kafka using offset id
  • Data level transformations have been done in intermediate tables before forming final tables
  • Data Integrity checks have been handled using hive queries, Hadoop and Spark
  • Daily, Monthly, Quarterly and ad hoc based data loads automated in Control -M and will run as per calendar dates scheduled
  • Involved in Production Support, BAU Activities and Release management
  • Expertise in writing custom UDFs in Hive
  • Environment: Hadoop, Kafka, Spark, Hive, Shell, Sqoop, Oozie Workflows, Teradata, Netezza, Sql Server, Aws, Oracle, Hue, Impala, Cloudera Manager

HADOOP DEVELOPER

PepsiCo
Winston-Salem, North Carolina
04.2019 - 12.2020
  • Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Experienced in Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper
  • Analyze latest Big Data Analytic Cloud technologies like most of the AWS services
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop
  • Understanding of core AWS services, uses, and basic AWS architecture and AWS service APIs, AWS CLI, and SDKs to write applications
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Creating Hive tables and working on them using Hive QL
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
  • Worked on Cluster co-ordination services through Zookeeper
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources
  • Involved in Agile methodologies, daily scrum meetings, spring planning
  • Environment: Hadoop, Kafka, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Aws, Hortonworks, Cloudera Manager, Apache Yarn, Python

BIG DATA SPECIALIST

BMO Harries Bank
New York, New York
02.2018 - 03.2019
  • Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications
  • Worked closely with the data modelers to model the new incoming data sets
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs)
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, Oozie, Zookeeper, Sqoop, Spark, Impala, Cassandra with Horton work Distribution
  • Involved in creating Hive tables to loading data and writing hive queries
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
  • Import the data from different sources like HDFS/HBase or local into Spark RDD
  • Developed a data pipeline using Kafka and Storm to store data into HDFS
  • Performed real time analysis on the incoming data
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data
  • Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, HBASE, Oozie, Scala, Spark, Aws Linux

Education

Bachelor of Science -

Stamford University

Skills

Python and SQL

Data Engineering

ETL processes

Performance Tuning

Linux Environment

Big Data Processing

Hadoop Ecosystem Knowledge

Streaming data processing

Databricks platform

Agile Methodologies

Timeline

HADOOP DEVELOPER

PepsiCo
04.2019 - 12.2020

BIG DATA SPECIALIST

BMO Harries Bank
02.2018 - 03.2019

BIG DATA SPARK DEVELOPER

Liberty Mutual Insurance
1 2021 - Current

Bachelor of Science -

Stamford University
MILTON HASANBig Data Developer