Summary

Overview

Work History

Education

Skills

Websites

Timeline

Md Jahid Hasan Bhuyian

Queens,NY

Summary

Data Engineer with 6 years of experience in designing and implementing cutting-edge big data solutions. Proven track record of quickly adapting to new technologies and cloud platforms, including AWS and Azure, to manage and process large-scale data sets. Skilled in programming languages such as Python, SQL, and Java to build flexible and scalable data pipelines. Experienced in utilizing big data technologies such as Hadoop, Spark, and Hive to support real-time data processing and analytics. Strong understanding of data warehousing concepts and data governance best practices. Always eager to learn and take on new challenges, committed to delivering high-quality and impactful data solutions to drive business success."

Overview

years of professional experience

Work History

Senior Data Engineer

Metlife

01.2022 - Current

Designed and implemented highly scalable data processing pipelines on Azure cloud platform, utilizing Azure Data Factory, Stream Analytics, and DataBricks
Expertise in migrating and integrating data from various sources, including databases, APIs and file systems into Azure data storage, such as SQL Database, Cosmos DB, and Blob Storage
Experienced in real-time data processing, implemented pipeline for processing incoming data from IoT devices using Azure Event Hubs and Stream Analytics, ensuring timely and accurate data analysis
Developed predictive models using Azure Machine Learning, and deployed these models into production, resulting in 15% increase in efficiency
Solid experience in Advanced SQL with a proven track record of designing and implementing complex queries, stored procedures, functions, and triggers in a fast-paced, data-intensive environment.
Utilized advanced SQL techniques to optimize database performance and enhance data quality, resulting in improved business insights and operational efficiencies.
Proficient in data visualization, created interactive dashboards using Power BI for business stakeholders, resulting in clearer understanding of business performance
Mount Azure Data Lake containers to DataBricks and create service principals, access keys, tokens to access Azure Data Lake Gen2 storage account
Import raw data such as csv, json files into Azure Data Lake Gen2 to perform data ingestion by writing PySpark to extract flat files
Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate and reformat into parquet files and load them into Azure Blob storage container
Develop Azure linked services to construct connections with on-premises Oracle
Database, SQL Server, Apache Hive with Azure datasets in the cloud
Build ETL data pipelines in Azure Data Factory (ADF) to manage and process >1B+ rows into Azure SQL DW
Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes
Connected the data bricks notebooks with Airflow to schedule and monitor the ETL process
Train NLP Question & Answering models using BERT Transfer Learning to answer domain questions; expedite Name-Entity Recognition process
Provided user management and support by administering epics, user stories, tasks in Jira using Agile methodology, logged process flow documents in Confluence
Worked with a team of developers to develop and implement an API for optical character recognition and named entity extraction on medical records
Analyzed and extracted relevant information from medical forms to support medical summary reports, compliance, claims settlement, litigation, and predictive analysis
Utilized azure.cognitiveservices.vision.computervision, pytesseract, spacy, and pytextrank for NER and OCR technologies
Worked with Flask and Werkzeug for web development and REST API implementation
Deployed predictive models using the AzureML platform
Environment: Azure HDInsight, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL,Snowflake, Cassandra, Teradata, Ambari, PowerBI, Azure, Blob Storage, DataFactory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark,Git, PySpark, Airflow, Hive, HBase, Airflow,AzureML

Data Engineer

Fannie Mae

04.2018 - 12.2021

Designed, built and managed data pipelines for real-time processing and batch processing using AWS Glue and Amazon Kinesis
Implemented data storage in S3 with appropriate partitioning and compression techniques to optimize data retrieval
Implemented a scalable data warehousing solution using Amazon Redshift.
Designed and optimized Redshift tables and ensured high performance through appropriate distribution styles, sort keys, and compression encodings
Worked with large datasets using Apache Spark and Amazon EMR
Developed Spark scripts to perform complex data transformations and aggregations, and optimized Spark job performance for faster processing times
Built scalable NoSQL databases using Amazon DynamoDB, Amazon DocumentDB and Amazon Neptune to store and retrieve semi-structured and unstructured data
Developed serverless architectures using AWS Lambda and Amazon SNS to process real-time data and trigger event-driven workflows
Used Apache NiFi and AWS Glue to build custom data ingestion and transformation workflows
Automated data validation and quality checks using AWS Glue jobs and Apache NiFi processes
Implemented security measures to protect sensitive data stored in S3 and managed data access using Amazon IAM and Amazon VPC
Monitored pipeline performance and security events using Amazon CloudWatch and AWS CloudTrail
Strong SQL, data warehousing, and ETL experience on traditional databases
Advanced knowledge of Amazon Web Services and its major components (EC2, S3, RDS, VPC, IAM, etc.)
Worked on Amazon Web Services (AWS) technologies including EC2, S3, RDS, ELB, and Elasticache
Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
Tracking operations using sensors until certain criteria is met using Airflow technology
Developed Spark scripts by using Python shell commands as per the requirement
Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
Experience with creating dynamic dashboards, using parameters, filters, and calculated fields in Tableau and PowerBI
Experience with using PowerBI's data visualization options, such as charts, tables, and KPIs
Knowledge of Tableau's data blending and data joining options, including data blending and join calculations
Experience with creating and managing data sources in Tableau and PowerBI, including Excel spreadsheets and SQL databases
Migrated on-premise databases to the cloud using AWS DMS and Amazon EC2
Monitored the migration process, ensured data consistency and resolved any issues during migration
Deployed machine learning models using Amazon SageMaker and integrated them with existing data pipelines
Developed automated processes to retrain models on a regular basis and monitor model performance using Amazon CloudWatch
Environments : Hadoop 2.x, Hive, HDFS, Python, Spark, Sqoop, Oozie, AWSS3, Amazon Redshift .MySQL, PostgreSQL,EMR, AWS Glue, Amazon EMR

Data Engineer

Visa

03.2016 - 03.2018

Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive and Sqoop
Developed Spark scripts by using Scala shell commands as per the requirement
Created Spark jobs to see trends in data usage by users
Used Spark and Spark SQL to read the Parquet data and create the tables in Hive using the Scala API.
Loaded data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API
Developed Kafka pub-sub, Cassandra clients and Spark along with components on HDFS and Hive
Populated HDFS and HBase with huge amounts of data using Apache Kafka
Configured deployed and maintained multi-node Dev and Test Kafka Clusters
Developed the Pig UDF'S to pre-process the data for analysis
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL
Created Hive tables to store data and written Hive queries
Extracted the data from Teradata into HDFS using Sqoop
Exported the patterns analyzed back to Teradata using Sqoop
Involved in Installing, Configuring Hadoop EcoSystem, and Cloudera Manager using CDH4 Distribution
Developed Spark code to use Scala and Spark-SQL for faster processing and testing
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
Used Spark API over Hadoop YARN as execution engine for data analytics using Hive
Experienced data pipelines using Kafka and Akka for handling large terabytes of data
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
Developed Scala scripts to extract the data from the web server output files to load into HDFS
Design and implement Map Reduce jobs to support distributed data processing
Process large data sets utilizing our Hadoop cluster
Designing NoSQL schemas in HBase
Developing Mapreduce ETL in Python/Pig
Involved in data validation using HIVE
Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa
Involved in weekly walkthrough and inspection meetings, to verify the status of the testing efforts and the project
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)

Education

Bachelor of Science - Computer Science

North South University

Dhaka Bangladesh

Skills

Python, R, SQL, Java, Scala
Hadoop, Spark, Hive
AWS,Azure
Machine Learning
Tableau, PowerBI
Apache Airflow, GIT, Jenkins

Business Intelligence
Data Modeling
Data Analysis
Production Work
Agile methodologies: Scrum, Kanban

Websites

https://www.linkedin.com/in/mahimislam

Timeline

Senior Data Engineer

Metlife

01.2022 - Current

Data Engineer

Fannie Mae

04.2018 - 12.2021

Data Engineer

Visa

03.2016 - 03.2018

Bachelor of Science - Computer Science

North South University

Md Jahid Hasan Bhuyian

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Data Engineer

Education

Bachelor of Science - Computer Science

Skills

Websites

Timeline

Senior Data Engineer

Data Engineer

Data Engineer

Bachelor of Science - Computer Science

Similar Profiles

HITESH JHAHITESH JHA

A-Lauanya CovingtonA-Lauanya Covington

Isaac KinityIsaac Kinity

Meighen Patricia FriesenMeighen Patricia Friesen

Demmi DelgadoDemmi Delgado