Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mahim Islam

QUEENS,NY

Summary

Data Engineer with 7 years of experience in designing and implementing cutting-edge Data engineering solutions . Proven track record of quickly adapting to new technologies and cloud platforms, including AWS and Azure, to manage and process large-scale data sets. Skilled in programming languages such as Python, SQL, and Java to build flexible and scalable data pipelines. Experienced in utilizing big data technologies such as Hadoop, Spark, and Hive to support real-time data processing and analytics. Strong understanding of data warehousing concepts and data governance best practices. Always eager to learn and take on new challenges, committed to delivering high-quality and impactful data solutions to drive business success.

Overview

6
6
years of professional experience

Work History

KNOWiNK
06.2022 - Current
  • Led the end-to-end design and implementation of highly scalable and fault-tolerant data pipelines on AWS / Azure processing millions of records daily and ensuring timely and accurate data delivery to downstream systems
  • Developed and maintained comprehensive data lineage documentation, providing transparency and traceability for data transformations and ensuring compliance with regulatory requirements
  • Played a key role in migrating on-premises data warehouses to Amazon Redshift, leading the design and execution of the migration plan, resulting in 20% reduction in infrastructure costs and improved query performance
  • Collaborated with data scientists and analysts to understand their data requirements, implemented data transformations to support their analytical models, and ensured seamless integration of data pipelines with machine learning workflows using AWS SageMaker
  • Extensive experience architecting and implementing data engineering solutions on AWS, leveraging services such as AWS Lambda, Amazon Kinesis, AWS Glue and Amazon Redshift
  • Implemented security best practices in Terraform configurations, such as IAM roles and policies, to ensure least privilege access to cloud resources
  • Proficient in utilizing AWS IAM for fine-grained access control and security management in data engineering solutions
  • Extensive experience architecting and implementing data engineering solutions on Azure, leveraging services such as Azure Data Factory, Azure Databricks, Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Data Warehouse
  • Proficient in utilizing Azure Blob Storage and Azure Data Lake Storage for secure and scalable storage of structured and unstructured data, ensuring data availability and accessibility for analytical processing
  • Extensive experience in designing and implementing interactive and visually compelling data visualizations using a diverse range of tools and technologies, including Power BI, Tableau, AWS QuickSight, and on-premises solutions
  • Strong knowledge of data visualization best practices, ensuring the use of appropriate chart types, color schemes, and interactivity to enhance data comprehension and engagement
  • Received recognition for outstanding data visualization projects that resulted in improved data understanding, data-driven decision-making, and positive business outcomes across cloud and on-premises environments
  • Developed data visualization dashboards using Python libraries like matplotlib, seaborn, and Plotly, providing intuitive visual representations of complex data
  • Developed scalable data engineering pipelines in Python for AI applications, incorporating data ingestion, preprocessing, feature engineering, and model deployment
  • Successfully deployed and managed cloud resources, including virtual machines, networks, storage, databases, and security groups, using Terraform configurations
  • Utilized Python frameworks such as TensorFlow, PyTorch, and Keras to design, train, and deploy machine learning and deep learning models for AI applications
  • Environment:-AWS, Lambda, S3,Glue,Emr,Redshift,Athena,CloudWatch, EC2, QuickSight,IAM, Terraform, Azure,ADF, Databricks, Synapse,EventHub and Power Bi

Data Engineer

MetLife
04.2020 - 05.2022
  • Designed and implemented highly scalable data processing pipelines on Azure cloud platform, utilizing Azure Data Factory, Stream Analytics, and DataBricks
  • Expertise in migrating and integrating data from various sources, including databases, APIs and file systems into Azure data storage, such as SQL Database, Cosmos DB, and Blob Storage
  • Experienced in real-time data processing, implemented a pipeline for processing incoming data from IoT devices using Azure Event Hubs and Stream Analytics, ensuring timely and accurate data analysis
  • Construct data transformation by writing PySpark in Data Bricks to rename, drop, clean, validate and reformat into parquet files and load them into Azure Blob storage container
  • Develop Azure linked services to construct connections with on-premises Oracle database, SQL Server, Apache Hive with Azure datasets in the cloud
  • Build ETL data pipelines in Azure Data Factory (ADF) to manage and process 1TB+ rows into Azure SQL DW
  • Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes
  • Connected the data bricks notebooks with Airflow to schedule and monitor the ETL process
  • Provided user management and support by administering epics, user stories, tasks in Jira using Agile methodology, logged process flow documents in Confluence
  • Worked with a team of developers to develop and implement an API for optical character recognition and named entity extraction on medical records
  • Analyzed and extracted relevant information from medical forms to support medical summary reports, compliance, claims settlement, litigation, and predictive analysis
  • Worked with Flask and Werkzeug for web development and REST API implementation
  • Deployed predictive models using the AzureML platform
  • Environment:- Azure HDInsight, DataBricks, Data Lake, Cosmos DB, MySQL, Azure SQL,Snowflake, Cassandra, Teradata, Ambari, PowerBI, Azure, Blob Storage, DataFactory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark,Git, PySpark,Airflow, Hive, HBase, Airflow and AzureML.

Data Engineer

Fannie Mae
04.2018 - 03.2020
  • Designed, built and managed data pipelines for real-time processing and batch processing using AWS Glue and Amazon Kinesis
  • Implemented a scalable data warehousing solution using Amazon Redshift
  • Designed and optimized Redshift tables and ensured high performance through appropriate distribution styles, sort keys, and compression encodings
  • Worked with large datasets using Apache Spark and Amazon EMR
  • Developed Spark scripts to perform complex data transformations and aggregations, and optimized Spark job performance for faster processing times
  • Built scalable NoSQL databases using Amazon DynamoDB, Amazon DocumentDB and Amazon Neptune to store and retrieve semi-structured and unstructured data
  • Developed serverless architectures using AWS Lambda and Amazon SNS to process real-time data and trigger event-driven workflows
  • Used Apache NiFi and AWS Glue to build custom data ingestion and transformation workflows
  • Automated data validation and quality checks using AWS Glue jobs and Apache NiFi processes
  • Implemented security measures to protect sensitive data stored in S3 and managed data access using Amazon IAM and Amazon VPC
  • Monitored pipeline performance and security events using Amazon CloudWatch and AWS CloudTrail
  • Strong SQL, data warehousing, and ETL experience on traditional databases
  • Advanced knowledge of Amazon Web Services and its major components (EC2, S3, RDS, VPC, IAM, etc.)
  • Worked on Amazon Web Services (AWS) technologies including EC2, S3, RDS, ELB, and Elasticache
  • Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
  • Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
  • Experience with using PowerBI's data visualization options, such as charts, tables, and KPIs
  • Knowledge of Tableau's data blending and data joining options, including data blending and join calculations
  • Experience with creating and managing data sources in Tableau and PowerBI, including Excel spreadsheets and SQL databases
  • Migrated on-premise databases to the cloud using AWS DMS and Amazon EC2
  • Monitored the migration process, ensured data consistency and resolved any issues during migration
  • Deployed machine learning models using Amazon SageMaker and integrated them with existing data pipelines
  • Developed automated processes to retrain models on a regular basis and monitor model performance using Amazon CloudWatch
  • Environments:- Hadoop 2.x, Hive, HDFS, Python, Spark, Sqoop, Oozie, AWS S3, CloudWatch, Redshift, MySQL, PostgreSQL,EMR, AWS Glue and Athena.

Data Engineer

VISA
04.2017 - 03.2018
  • Proficiently managed and optimized data processing pipelines in a Hadoop ecosystem, leveraging components like HDFS, MapReduce, Hive, and HBase to handle large-scale datasets efficiently
  • Developed robust end-to-end data processing pipelines utilizing Hadoop components, including HDFS for data storage, MapReduce for parallel data processing, and Hive for structured data querying
  • Implemented Python-based APIs for data collection, transformation, and integration, ensuring efficient data communication between systems
  • Loaded data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API
  • Developed Kafka pub-sub, Cassandra clients and Spark along with components on HDFS and Hive
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters
  • Developed the Pig UDF'S to pre-process the data for analysis
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL
  • Created Hive tables to store data and written Hive queries
  • Extracted the data from Teradata into HDFS using Sqoop
  • Exported the patterns analyzed back to Teradata using Sqoop
  • Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution
  • Developed Spark code to use Scala and Spark-SQL for faster processing and testing
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
  • Involved in weekly walkthrough and inspection meetings, to verify the status of the testing efforts and the project
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
  • Utilized Spark Streaming to process and analyze real-time data streams, enabling timely decision-making and event-driven processing
  • Developed and maintained containerized data engineering applications using Docker and Kubernetes for consistent deployment and scalability
  • Managed data processing and storage using Hadoop components, including Sqoop for seamless data import/export
  • Demonstrated expertise in integrating machine learning pipelines with data engineering workflows using Apache Spark MLlib or TensorFlow

Education

Bachelor of Science - Computer Science

Uttara Institute of Business And Technology
2016

Skills

  • Python,SQL,Java,Scala,R
  • AWS,Azure,Snowflake
  • Hadoop, Spark, Hive
  • Machine Learning
  • Tableau, PowerBI
  • Airflow, GIT, Jenkins,Jira
  • Business Intelligence
  • Data Modeling,Data Analysis
  • Data Visualization,Production Work
  • Agile,Scrum

Timeline

KNOWiNK
06.2022 - Current

Data Engineer

MetLife
04.2020 - 05.2022

Data Engineer

Fannie Mae
04.2018 - 03.2020

Data Engineer

VISA
04.2017 - 03.2018

Bachelor of Science - Computer Science

Uttara Institute of Business And Technology
Mahim Islam