Summary

Overview

Work History

Education

Skills

Timeline

Mahim Islam

QUEENS,NY

Summary

Data Engineer with 7 years of experience in designing and implementing cutting-edge Data engineering solutions . Proven track record of quickly adapting to new technologies and cloud platforms, including AWS and Azure, to manage and process large-scale data sets. Skilled in programming languages such as Python, SQL, and Java to build flexible and scalable data pipelines. Experienced in utilizing big data technologies such as Hadoop, Spark, and Hive to support real-time data processing and analytics. Strong understanding of data warehousing concepts and data governance best practices. Always eager to learn and take on new challenges, committed to delivering high-quality and impactful data solutions to drive business success.

Overview

years of professional experience

Work History

KNOWiNK

06.2022 - Current

Led the end-to-end design and implementation of highly scalable and fault-tolerant data pipelines on AWS / Azure processing millions of records daily and ensuring timely and accurate data delivery to downstream systems
Developed and maintained comprehensive data lineage documentation, providing transparency and traceability for data transformations and ensuring compliance with regulatory requirements
Played a key role in migrating on-premises data warehouses to Amazon Redshift, leading the design and execution of the migration plan, resulting in 20% reduction in infrastructure costs and improved query performance
Collaborated with data scientists and analysts to understand their data requirements, implemented data transformations to support their analytical models, and ensured seamless integration of data pipelines with machine learning workflows using AWS SageMaker
Extensive experience architecting and implementing data engineering solutions on AWS, leveraging services such as AWS Lambda, Amazon Kinesis, AWS Glue and Amazon Redshift
Implemented security best practices in Terraform configurations, such as IAM roles and policies, to ensure least privilege access to cloud resources
Proficient in utilizing AWS IAM for fine-grained access control and security management in data engineering solutions
Extensive experience architecting and implementing data engineering solutions on Azure, leveraging services such as Azure Data Factory, Azure Databricks, Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Data Warehouse
Proficient in utilizing Azure Blob Storage and Azure Data Lake Storage for secure and scalable storage of structured and unstructured data, ensuring data availability and accessibility for analytical processing
Extensive experience in designing and implementing interactive and visually compelling data visualizations using a diverse range of tools and technologies, including Power BI, Tableau, AWS QuickSight, and on-premises solutions
Strong knowledge of data visualization best practices, ensuring the use of appropriate chart types, color schemes, and interactivity to enhance data comprehension and engagement
Received recognition for outstanding data visualization projects that resulted in improved data understanding, data-driven decision-making, and positive business outcomes across cloud and on-premises environments
Developed data visualization dashboards using Python libraries like matplotlib, seaborn, and Plotly, providing intuitive visual representations of complex data
Developed scalable data engineering pipelines in Python for AI applications, incorporating data ingestion, preprocessing, feature engineering, and model deployment
Successfully deployed and managed cloud resources, including virtual machines, networks, storage, databases, and security groups, using Terraform configurations
Utilized Python frameworks such as TensorFlow, PyTorch, and Keras to design, train, and deploy machine learning and deep learning models for AI applications
Environment:-AWS, Lambda, S3,Glue,Emr,Redshift,Athena,CloudWatch, EC2, QuickSight,IAM, Terraform, Azure,ADF, Databricks, Synapse,EventHub and Power Bi

Data Engineer

MetLife

04.2020 - 05.2022

Designed and implemented highly scalable data processing pipelines on Azure cloud platform, utilizing Azure Data Factory, Stream Analytics, and DataBricks
Expertise in migrating and integrating data from various sources, including databases, APIs and file systems into Azure data storage, such as SQL Database, Cosmos DB, and Blob Storage
Experienced in real-time data processing, implemented a pipeline for processing incoming data from IoT devices using Azure Event Hubs and Stream Analytics, ensuring timely and accurate data analysis
Construct data transformation by writing PySpark in Data Bricks to rename, drop, clean, validate and reformat into parquet files and load them into Azure Blob storage container
Develop Azure linked services to construct connections with on-premises Oracle database, SQL Server, Apache Hive with Azure datasets in the cloud
Build ETL data pipelines in Azure Data Factory (ADF) to manage and process 1TB+ rows into Azure SQL DW
Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes
Connected the data bricks notebooks with Airflow to schedule and monitor the ETL process
Provided user management and support by administering epics, user stories, tasks in Jira using Agile methodology, logged process flow documents in Confluence
Worked with a team of developers to develop and implement an API for optical character recognition and named entity extraction on medical records
Analyzed and extracted relevant information from medical forms to support medical summary reports, compliance, claims settlement, litigation, and predictive analysis
Worked with Flask and Werkzeug for web development and REST API implementation
Deployed predictive models using the AzureML platform
Environment:- Azure HDInsight, DataBricks, Data Lake, Cosmos DB, MySQL, Azure SQL,Snowflake, Cassandra, Teradata, Ambari, PowerBI, Azure, Blob Storage, DataFactory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark,Git, PySpark,Airflow, Hive, HBase, Airflow and AzureML.

Data Engineer

Fannie Mae

04.2018 - 03.2020

Designed, built and managed data pipelines for real-time processing and batch processing using AWS Glue and Amazon Kinesis
Implemented a scalable data warehousing solution using Amazon Redshift
Designed and optimized Redshift tables and ensured high performance through appropriate distribution styles, sort keys, and compression encodings
Worked with large datasets using Apache Spark and Amazon EMR
Developed Spark scripts to perform complex data transformations and aggregations, and optimized Spark job performance for faster processing times
Built scalable NoSQL databases using Amazon DynamoDB, Amazon DocumentDB and Amazon Neptune to store and retrieve semi-structured and unstructured data
Developed serverless architectures using AWS Lambda and Amazon SNS to process real-time data and trigger event-driven workflows
Used Apache NiFi and AWS Glue to build custom data ingestion and transformation workflows
Automated data validation and quality checks using AWS Glue jobs and Apache NiFi processes
Implemented security measures to protect sensitive data stored in S3 and managed data access using Amazon IAM and Amazon VPC
Monitored pipeline performance and security events using Amazon CloudWatch and AWS CloudTrail
Strong SQL, data warehousing, and ETL experience on traditional databases
Advanced knowledge of Amazon Web Services and its major components (EC2, S3, RDS, VPC, IAM, etc.)
Worked on Amazon Web Services (AWS) technologies including EC2, S3, RDS, ELB, and Elasticache
Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
Developed Spark scripts by using Python shell commands as per the requirement
Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
Experience with using PowerBI's data visualization options, such as charts, tables, and KPIs
Knowledge of Tableau's data blending and data joining options, including data blending and join calculations
Experience with creating and managing data sources in Tableau and PowerBI, including Excel spreadsheets and SQL databases
Migrated on-premise databases to the cloud using AWS DMS and Amazon EC2
Monitored the migration process, ensured data consistency and resolved any issues during migration
Deployed machine learning models using Amazon SageMaker and integrated them with existing data pipelines
Developed automated processes to retrain models on a regular basis and monitor model performance using Amazon CloudWatch
Environments:- Hadoop 2.x, Hive, HDFS, Python, Spark, Sqoop, Oozie, AWS S3, CloudWatch, Redshift, MySQL, PostgreSQL,EMR, AWS Glue and Athena.

Data Engineer

VISA

04.2017 - 03.2018

Proficiently managed and optimized data processing pipelines in a Hadoop ecosystem, leveraging components like HDFS, MapReduce, Hive, and HBase to handle large-scale datasets efficiently
Developed robust end-to-end data processing pipelines utilizing Hadoop components, including HDFS for data storage, MapReduce for parallel data processing, and Hive for structured data querying
Implemented Python-based APIs for data collection, transformation, and integration, ensuring efficient data communication between systems
Loaded data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API
Developed Kafka pub-sub, Cassandra clients and Spark along with components on HDFS and Hive
Populated HDFS and HBase with huge amounts of data using Apache Kafka
Configured deployed and maintained multi-node Dev and Test Kafka Clusters
Developed the Pig UDF'S to pre-process the data for analysis
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL
Created Hive tables to store data and written Hive queries
Extracted the data from Teradata into HDFS using Sqoop
Exported the patterns analyzed back to Teradata using Sqoop
Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution
Developed Spark code to use Scala and Spark-SQL for faster processing and testing
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
Involved in weekly walkthrough and inspection meetings, to verify the status of the testing efforts and the project
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
Utilized Spark Streaming to process and analyze real-time data streams, enabling timely decision-making and event-driven processing
Developed and maintained containerized data engineering applications using Docker and Kubernetes for consistent deployment and scalability
Managed data processing and storage using Hadoop components, including Sqoop for seamless data import/export
Demonstrated expertise in integrating machine learning pipelines with data engineering workflows using Apache Spark MLlib or TensorFlow

Education

Bachelor of Science - Computer Science

Uttara Institute of Business And Technology

2016

Skills

Python,SQL,Java,Scala,R
AWS,Azure,Snowflake
Hadoop, Spark, Hive
Machine Learning
Tableau, PowerBI

Airflow, GIT, Jenkins,Jira
Business Intelligence
Data Modeling,Data Analysis
Data Visualization,Production Work
Agile,Scrum

Timeline

KNOWiNK

06.2022 - Current

Data Engineer

MetLife

04.2020 - 05.2022

Data Engineer

Fannie Mae

04.2018 - 03.2020

Data Engineer

VISA

04.2017 - 03.2018

Bachelor of Science - Computer Science

Uttara Institute of Business And Technology

Mahim Islam

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Education

Bachelor of Science - Computer Science

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Bachelor of Science - Computer Science

Similar Profiles

Marvin VanhookMarvin Vanhook

Regina RachalRegina Rachal

Daniel OttavianoDaniel Ottaviano

Jovita ManlimosJovita Manlimos

Fazal SayedFazal Sayed