Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mahi Islam

Queens,NY

Summary

5 years of IT experience working in a variety of environments with a vast number of tools and technologies, that included but were not limited to the programming languages of Python, Java and SQL. Worked as a Developer and a Data Engineer with big data technologies including Hadoop, Spark, and cloud technologies with cross platform integration experience.



Overview

5
5
years of professional experience

Work History

Data Engineer

Fannie Mae
New York, NY
07.2020 - Current
  • Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
  • Tracking operations using sensors until certain criteria is met using Airflow technology
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Developed Python scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed enterprise application using Python
  • Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
  • Experience and hands-on knowledge in Akka and LIFT Framework
  • Used PostgreSQL and No-SQL database and integrated with Hadoop to develop datasets on HDFS
  • Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data
  • Experience designing, reviewing, implementing, and optimizing data transformation processes in the Hadoop and Talend and Informatica ecosystems
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Coordinated with admins and Technical staff for migrating Teradata to Hadoop and Ab Initio to Hadoop
  • Configured Hadoop clusters and coordinated with Big Data Admins for cluster maintenance
  • Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.
  • Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.

Big Data Developer

Visa
New York, NY
06.2018 - 05.2020
  • Involved in the software development lifecycle (SDLC) of tracking the requirements, gathering, analysis, detailed design, development, system testing and user acceptance testing
  • Developed entire frontend and backend modules using Python on Django Web Framework
  • Involved in designing user interactive web pages as the front-end part of the web application using various web technologies like HTML, JavaScript, Angular JS, jQuery, AJAX and implementing CSS for better appearance and feel
  • Actively involved in developing the methods for Create, Read, Update and Delete (CRUD) in Active Record
  • Design and Setting up of the environment of MongoDB with shards and replica sets
  • (Dev/Test and Production)
  • Private VPN using Ubuntu, Python, Django, Postgres, Redis, Bootstrap, jQuery, Mongo, Fabric, Git, Tenjin and Selenium
  • Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash
  • Implemented Multithreading module and complex networking operations like race route, SMTP mail server and web server Using Python
  • Used NumPy for Numerical analysis for the Insurance premium
  • Implemented and modified various SQL queries and Functions, Cursors and Triggers as per the client requirements
  • Managed code versioning with GitHub, Bitbucket, and deployment to staging and production servers
  • Implemented MVC architecture in developing the web application with the help of Django framework
  • Used Celery as task queue and RabbitMQ, Redis as messaging broker to execute asynchronous tasks
  • Designed and managed API system deployment using a fast HTTP server and Amazon AWS architecture
  • Involved in code reviews using GitHub pull requests, reducing bugs, improving code quality and increasing knowledge sharing
  • Install and configure monitoring scripts for AWS EC2 instances
  • Implemented task object to interface with data feed framework and invoke database message service setup and update functionality
  • Working under UNIX environment in the development of application using Python and familiar with all its commands
  • Developed remote integration with third-party platforms by using RESTful web services
  • Updated and maintained Jenkins for automatic building jobs and deployment
  • Improved code reuse and performance by making effective use of various design patterns and refactoring code base
  • Updated and maintained Puppet Spec unit/system test
  • Worked on debugging and troubleshooting programming related issues
  • Worked in the MySQL database on simple queries and writing Stored Procedures for normalization
  • Deployment of the web application using the Linux server
  • Environment: Python 2.7, Django 1.4, HTML5, CSS, XML, MySQL, JavaScript, Backbone JS, jQuery, MongoDB, MS SQL Server, JavaScript, Git, GitHub, AWS, Linux, Shell Scripting, AJAX, JAVA.
  • Engaged with business representatives, business analysts and developers and delivered comprehensive business-facing analytics solutions.
  • Wrote software that scaled to petabytes of data and supported millions of transactions per second.

Big Data Developer

GAP
New York, NY
05.2017 - 05.2018
  • Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive and Sqoop
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Created Spark jobs to see trends in data usage by users
  • Used Spark and Spark-SQL to read the parquet data and create the tables in Hive using the Scala API
  • Loaded data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API
  • Developed Kafka pub-sub, Cassandra clients and Spark along with components on HDFS and Hive
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters
  • Developed the Pig UDF'S to pre-process the data for analysis
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL
  • Created Hive tables to store data and written Hive queries
  • Extracted the data from Teradata into HDFS using Sqoop
  • Exported the patterns analyzed back to Teradata using Sqoop
  • Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution
  • Developed Spark code to use Scala and Spark-SQL for faster processing and testing
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive
  • Experienced data pipelines using Kafka and Akka for handling large terabytes of data
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Developed Scala scripts to extract the data from the web server output files to load into HDFS
  • Design and implement Map Reduce jobs to support distributed data processing
  • Process large data sets utilizing our Hadoop cluster
  • Designing NoSQL schemas in HBase
  • Developing Mapreduce ETL in Python/Pig
  • Involved in data validation using HIVE
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa
  • Involved in weekly walkthroughs and inspection meetings, to verify the status of the testing efforts and the project
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
  • Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing) additional skills
  • Proficiency in MS Office Tools
  • Fluent in English, Bengali, Hindi
  • Working knowledge of Java, Python, SQL
  • Experienced at using AWS

Education

Bachelor of Computer Science -

Uttara Institute of Business & Technology
Dhaka,Bangladesh
12.2016

Skills

  • SQL , PYTHON, SPARK, HADOOP, HIVE
  • AWS, LAMBDA, GLU, EMR
  • Manual Testing
  • Bug Fixes
  • Dashboard Creation
  • Production Work
  • Database Development
  • Data Validation
  • Scrum Methodology

Timeline

Data Engineer

Fannie Mae
07.2020 - Current

Big Data Developer

Visa
06.2018 - 05.2020

Big Data Developer

GAP
05.2017 - 05.2018

Bachelor of Computer Science -

Uttara Institute of Business & Technology
Mahi Islam