Summary

Overview

Work History

Education

Skills

Timeline

Mahi Islam

Queens,NY

Summary

5 years of IT experience working in a variety of environments with a vast number of tools and technologies, that included but were not limited to the programming languages of Python, Java and SQL. Worked as a Developer and a Data Engineer with big data technologies including Hadoop, Spark, and cloud technologies with cross platform integration experience.

Overview

years of professional experience

Work History

Data Engineer

Fannie Mae

New York, NY

07.2020 - Current

Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
Tracking operations using sensors until certain criteria is met using Airflow technology
Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra
Developed Spark scripts by using Python shell commands as per the requirement
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Developed Python scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed enterprise application using Python
Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
Experience and hands-on knowledge in Akka and LIFT Framework
Used PostgreSQL and No-SQL database and integrated with Hadoop to develop datasets on HDFS
Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project
Developed Hive queries to process the data and generate the data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive
Good experience with Talend open studio for designing ETL Jobs for Processing of data
Experience designing, reviewing, implementing, and optimizing data transformation processes in the Hadoop and Talend and Informatica ecosystems
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Coordinated with admins and Technical staff for migrating Teradata to Hadoop and Ab Initio to Hadoop
Configured Hadoop clusters and coordinated with Big Data Admins for cluster maintenance
Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.
Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.
Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.

Big Data Developer

Visa

New York, NY

06.2018 - 05.2020

Involved in the software development lifecycle (SDLC) of tracking the requirements, gathering, analysis, detailed design, development, system testing and user acceptance testing
Developed entire frontend and backend modules using Python on Django Web Framework
Involved in designing user interactive web pages as the front-end part of the web application using various web technologies like HTML, JavaScript, Angular JS, jQuery, AJAX and implementing CSS for better appearance and feel
Actively involved in developing the methods for Create, Read, Update and Delete (CRUD) in Active Record
Design and Setting up of the environment of MongoDB with shards and replica sets
(Dev/Test and Production)
Private VPN using Ubuntu, Python, Django, Postgres, Redis, Bootstrap, jQuery, Mongo, Fabric, Git, Tenjin and Selenium
Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR
Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash
Implemented Multithreading module and complex networking operations like race route, SMTP mail server and web server Using Python
Used NumPy for Numerical analysis for the Insurance premium
Implemented and modified various SQL queries and Functions, Cursors and Triggers as per the client requirements
Managed code versioning with GitHub, Bitbucket, and deployment to staging and production servers
Implemented MVC architecture in developing the web application with the help of Django framework
Used Celery as task queue and RabbitMQ, Redis as messaging broker to execute asynchronous tasks
Designed and managed API system deployment using a fast HTTP server and Amazon AWS architecture
Involved in code reviews using GitHub pull requests, reducing bugs, improving code quality and increasing knowledge sharing
Install and configure monitoring scripts for AWS EC2 instances
Implemented task object to interface with data feed framework and invoke database message service setup and update functionality
Working under UNIX environment in the development of application using Python and familiar with all its commands
Developed remote integration with third-party platforms by using RESTful web services
Updated and maintained Jenkins for automatic building jobs and deployment
Improved code reuse and performance by making effective use of various design patterns and refactoring code base
Updated and maintained Puppet Spec unit/system test
Worked on debugging and troubleshooting programming related issues
Worked in the MySQL database on simple queries and writing Stored Procedures for normalization
Deployment of the web application using the Linux server
Environment: Python 2.7, Django 1.4, HTML5, CSS, XML, MySQL, JavaScript, Backbone JS, jQuery, MongoDB, MS SQL Server, JavaScript, Git, GitHub, AWS, Linux, Shell Scripting, AJAX, JAVA.
Engaged with business representatives, business analysts and developers and delivered comprehensive business-facing analytics solutions.
Wrote software that scaled to petabytes of data and supported millions of transactions per second.

Big Data Developer

GAP

New York, NY

05.2017 - 05.2018

Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive and Sqoop
Developed Spark scripts by using Scala shell commands as per the requirement
Created Spark jobs to see trends in data usage by users
Used Spark and Spark-SQL to read the parquet data and create the tables in Hive using the Scala API
Loaded data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API
Developed Kafka pub-sub, Cassandra clients and Spark along with components on HDFS and Hive
Populated HDFS and HBase with huge amounts of data using Apache Kafka
Configured deployed and maintained multi-node Dev and Test Kafka Clusters
Developed the Pig UDF'S to pre-process the data for analysis
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL
Created Hive tables to store data and written Hive queries
Extracted the data from Teradata into HDFS using Sqoop
Exported the patterns analyzed back to Teradata using Sqoop
Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution
Developed Spark code to use Scala and Spark-SQL for faster processing and testing
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
Used Spark API over Hadoop YARN as execution engine for data analytics using Hive
Experienced data pipelines using Kafka and Akka for handling large terabytes of data
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
Developed Scala scripts to extract the data from the web server output files to load into HDFS
Design and implement Map Reduce jobs to support distributed data processing
Process large data sets utilizing our Hadoop cluster
Designing NoSQL schemas in HBase
Developing Mapreduce ETL in Python/Pig
Involved in data validation using HIVE
Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa
Involved in weekly walkthroughs and inspection meetings, to verify the status of the testing efforts and the project
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing) additional skills
Proficiency in MS Office Tools
Fluent in English, Bengali, Hindi
Working knowledge of Java, Python, SQL
Experienced at using AWS

Education

Bachelor of Computer Science -

Uttara Institute of Business & Technology

Dhaka,Bangladesh

12.2016

Skills

SQL , PYTHON, SPARK, HADOOP, HIVE
AWS, LAMBDA, GLU, EMR
Manual Testing
Bug Fixes
Dashboard Creation

Production Work
Database Development
Data Validation
Scrum Methodology

Timeline

Data Engineer

Fannie Mae

07.2020 - Current

Big Data Developer

Visa

06.2018 - 05.2020

Big Data Developer

GAP

05.2017 - 05.2018

Bachelor of Computer Science -

Uttara Institute of Business & Technology

Mahi Islam

Summary

Overview

Work History

Data Engineer

Big Data Developer

Big Data Developer

Education

Bachelor of Computer Science -

Skills

Timeline

Data Engineer

Big Data Developer

Big Data Developer

Bachelor of Computer Science -

Similar Profiles

Veena DanguramathVeena Danguramath

James BrockJames Brock

RUSSELL KERRRUSSELL KERR

Sivananda Reddy DhoneSivananda Reddy Dhone

AMAIMA TAHIRAMAIMA TAHIR