Summary

Overview

Work History

Education

Skills

Personal Information

Timeline

Mike Husein

Staten Island,NY

Summary

Over 6 years of IT experience in analysis, design, development and implementation of large-scale applications using Big Data and Java/J2EE technologies such as Apache Spark, Hadoop, Hive , Sqoop, Oozie, Hbase, Zookeeper, Python & Scala Strong experience writing Spark Core, Spark SQL, Spark Streaming, Java MapReduce, Spark on Java Applications. Highly skilled in integrating Kafka with Spark Streaming applications to build long running real-time applications. Solid understanding of RDD operations in Apache Spark i.e., Transformations & Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts. In-depth knowledge of handling large amounts of data utilizing Spark Data Frames/Datasets API and Case Classes. Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop. In-depth knowledge of the Big Data Architecture along with-it various components of Hadoop 1.X and 2.X such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and YARN concepts such as Resource Manager, Node Manager. Hands on experience on AWS cloud services (EC2, S3, RDS, Glue, Redshift, Data Pipeline, EMR,, Workspaces, Lambda,, RDS). HiveQL scripts leading to good understanding in MapReduce design patterns, data analysis using Hive . Great knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment. Very capable at using AWS utilities such as EMR, S3 and CloudWatch to run and monitor Hadoop/Spark jobs on AWS. Proficient in importing and exporting data from Relational Database Systems to HDFS and vice versa, using Sqoop. Good understanding of column-family NoSQL databases like HBase, Cassandra and Mongo DB in enterprise use cases. Very capable in processing of large sets of structured, semi-structured and unstructured data and supporting system application architecture in Hadoop, Spark and SQL databases such as Teradata, MySQL, DB2. Experienced in version control and source code management tools like GIT, SVN, and Bitbucket. Experience in Java Application Development, Client/Server Applications using MVC, J2EE, JDBC, JSP, XML methodologies (XML, XSL, XSD), Web Services, Relational Databases and NoSQL Databases. Hands-on experience in application development using Java, RDBMS, and Linux shell scripting, Perl. Hands-on experience working with IDE tools such as Eclipse, IntelliJ, NetBeans, Visual Studio, GIT and Maven and experienced in writing cohesive E2E applications on Apache Zeppelin. . Experience working in Waterfall and Agile - SCRUM methodologies.

Overview

years of professional experience

Work History

Data Engineer

Forbes

New York

05.2019 - Current

Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Elastic Search, HBase and MapR DB
Event Streaming on different stages on Stream sets Data Collector, running a MapReduce job on event triggers to convert Avro to Parquet
Worked on analyzing Hadoop stack and different big data analytic tools including Kafka, Hive, HBase database and Sqoop
Created various Documents such as Source-To-Target Data mapping Document, Unit Test Cases and Data Migration Document
Worked on installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, JVM tuning, map and slots configuration
Designed and implemented Spark test bench application to evaluate quality of recommendations made by the engine
Tool monitored log input from several data centers, via Spark Stream, was analyze and data was parsed and saved into Cassandra
Implemented Cluster balancing
Migrated high-volume OLTP transactions from Oracle to Cassandra to reduce oracle licensing footprint
Streaming and complex analytics of processing are handled with use of Spark
Implemented test scripts to support test driven development and continuous integration
Worked on tuning the performance of Hive
Worked on Impala for Massive parallel processing of Hive queries
Streaming data to Hadoop using Kafka
Writing java code for custom partitioner and writable
Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka
Worked on to ease the jobs by building the applications on top of NoSQL database Cassandra
Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS
Unit tested and tuned SQLs and ETL Code for better performance
Monitored the performance and identified performance bottlenecks in ETL code
Used TABLEAU which grabs data to generate reports, graphs and charts summarizing the given set of data
Worked on data utilizing a Hadoop, Zookeeper, and Accumulate stack, aiding in the development of specialized indexes for performant queries on big data implementations
Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project
Developed Hive queries to process the data and generate the data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive
Good experience with Talend open studio for designing ETL Jobs for Processing of data
Experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Talend and Informatica ecosystems
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Configured Hadoop clusters and coordinated with Big Data Admins for cluster maintenance
Environment Hadoop YARN, Spark-Core, Spark-Streaming, Spark-SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux

Data Engineer

PayPal

Paolo

06.2018 - 05.2019

Worked on analyzing Hadoop cluster and different big data analytic tools including Hive and Sqoop
Developed Simple to complex MapReduce Jobs using Hive
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
Worked with Senior Engineer on configuring Kafka for streaming data
Responsible for building scalable distributed data solutions using Hadoop
Worked on project to retrieve log messages procured by leveraging Spark Streaming
Designed Oozie jobs for the auto processing of similar data
Collect the data using Spark Streaming
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior
Extensively used for all and bulk collect to fetch large volumes of data from table
Installed Oozie workflow engine to run multiple Hive and Pig jobs
Developed Pig scripts in the areas where extensive coding needs to be reduced
Worked with Spark Streaming to ingest data into spark engine
Extensively used for all and bulk collect to fetch large volumes of data from table
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce, and loaded data into HDFS
Created HBase tables to store various data formats of PII data coming from different portfolios
Configured Sqoop and developed scripts to extract data from MySQL into HDFS
Hands-on experience with productional zing Hadoop applications viz
Administration, configuration management, monitoring, debugging and performance tuning
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop
Created HBase tables to store various data formats of PII data coming from different portfolios
Data processing using SPARK
Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
Parsed high-level design specification to simple ETL coding and mapping standards
Cluster co-ordination services through Zookeeper
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data
Developed complex Talend jobs mappings to load the data from various sources using different components
Design, develop and implement solutions using Talend Integration Suite
Partitioning data streams using KAFKA
Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second
Used Kafka producer 0.8.3 API's to produce messages
Built big Data solutions using HBase handling millions of records for the different trends of data and exporting it to Hive
Developed scripts in Hive to perform transformations on the data and load to target systems for use by the data analysts for reporting
Tested the data coming from the source before processing
Familiarized with automated monitoring tools like Nagios
Used Oozie as workflow engine and Falcon for Job scheduling
Debugged the technical issues and errors was resolve
Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not
Analyzed user requirements, designed and developed ETL processes to load enterprise data into the Data Warehouse.
Identified key use cases and associated reference architectures for market segments and industry verticals.
Wrote and coded logical and physical database descriptions, specifying identifiers of database to management systems.
Collected, outlined and refined requirements, led design processes and oversaw project progress.

Data Engineer

Tapestry

San Francisco

01.2018 - 06.2018

Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications
Installed application on AWS EC2 instances and configured the storage on S3 buckets
Performed S3 buckets creation, policies and on the IAM role based polices and customizing the JSON template
Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch
Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management
Developed PIG scripts to transform the raw data into intelligent data as specified by business users
Worked in AWS environment for development and deployment of Custom Hadoop Applications
Worked closely with the data modelers to model the new incoming data sets
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution
Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
Configured deployed and maintained multi-node Dev and Test Kafka Clusters
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
Import the data from different sources like HDFS/HBase into Spark RDD
Developed a data pipeline using Kafka and Storm to store data into HDFS
Performed real time analysis on the incoming data
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Implemented Spark using Scala and Spark for faster testing and processing of data
Environment Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
Optimized existing queries to improve query performance by creating indexes on tables.
Managed the development and implementation of infrastructure automation solutions using AWS technologies such as EC2, S3, ECS, EKS, Lambda.

Education

Bachelor of Science - Information Technology

New York City College of Technology of The City University of New York

Skills

Requirements Specifications
IBM DB2
Hives Treatment
New Project Development
Apache Hadoop
Apache Spark
Sqoop

Data Lakes
Software Development Methodologies
User Profile
Software Solutions
Data Pipeline Design
Real-time Analytics
Data Migration

Personal Information

Citizenship: US Citizen

Timeline

Data Engineer

Forbes

05.2019 - Current

Data Engineer

PayPal

06.2018 - 05.2019

Data Engineer

Tapestry

01.2018 - 06.2018

Bachelor of Science - Information Technology

New York City College of Technology of The City University of New York

Mike Husein

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Education

Bachelor of Science - Information Technology

Skills

Personal Information

Timeline

Data Engineer

Data Engineer

Data Engineer

Bachelor of Science - Information Technology

Similar Profiles

Sughnen YongoSughnen Yongo

Sughnen YongoSughnen Yongo

Arpit JainArpit Jain

Chloe Iris KennedyChloe Iris Kennedy

Derick GuevaraDerick Guevara