Summary

Overview

Work History

Education

Skills

Timeline

Hamza Al - US CITIZEN

Staten Island,NY

Summary

Results-driven Senior Data Engineer with 7+ years of experience in big data, distributed systems, and cloud computing. Expertise in designing and implementing scalable ETL pipelines, data migrations, and real-time data processing using AWS, Azure, Hadoop, and Spark.

Proficient in AWS Database Migration Service (DMS), Azure Data Factory, Apache Spark, MapReduce, and SQL-based data transformations. Strong background in RDBMS, NoSQL databases (HBase, MarkLogic), and data integration tools like Sqoop and Flume. Skilled in optimizing performance, query tuning, and data modeling for large-scale datasets.

Experienced in CI/CD pipelines, automation, and workflow orchestration using Maven, Ant, Oozie, Zookeeper, and Terraform. Adept at building Scala and Python-based applications for structured and unstructured data processing in AWS and Azure environments.

Follows Agile and Scrum methodologies, ensuring efficient collaboration and iterative development

Overview

years of professional experience

Work History

Data Engineer

Delta Air Lines

09.2020 - Current

Developed and maintained serverless applications using AWS Lambda and AWS Step Functions, improving application performance and scalability.
Managed data infrastructure on AWS, including EC2 instances, RDS databases, and S3 buckets, ensuring high availability and reliability.
Designed and implemented data pipelines using AWS services such as S3, Glue, EMR, and Redshift, reducing data processing time by 50%.
Designed and implemented complex dashboards and reports to meet specific business requirements.
Integrated Flume with Kafka for reliable event streaming and seamless data flow to downstream applications.
Conducted performance tuning on Flume agents, optimizing throughput and minimizing latency.
Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data
Implemented custom Lambda layers for sharing code and dependencies across multiple functions.
Created Hive Tables, loaded transactional data from Teradata using Sqoop, and worked with highly unstructured and semi-structured data of 2 Petabytes in size.
Utilized Snowflake's data sharing features for securely sharing data across different Snowflake accounts.
Configured Snowflake stages and storage policies for efficient data storage and retrieval.
Developed MapReduce jobs for cleaning, accessing, and validating the data and created and worked Sqoop jobs with the incremental load to populate Hive External tables
Developed optimal strategies for distributing the weblog data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Designed and optimized PostgreSQL queries for efficient data retrieval and improved performance.
Extracted and processed PostgreSQL data using Spark and PySpark, integrating structured datasets into AWS Redshift and Snowflake.
Developed ETL pipelines that ingested transactional data from PostgreSQL into Amazon S3 and Databricks for further transformations
Implemented PostgreSQL stored procedures and functions to handle complex data processing logic before integration with analytics platforms.
Performed database indexing and partitioning in PostgreSQL to enhance query performance for airline operational data
Automated PostgreSQL data ingestion workflows using AWS Glue and Step Functions, ensuring seamless data movement across AWS services.
Responsible for building scalable distributed data solutions using Hadoop Cloudera and designed and developed automation test scripts using Python
Analyzed the SQL scripts and designed the solution to implement using Spark and implemented Hive Generic UDF's to incorporate business logic into Hive Queries
Executed DBT tests for validating data integrity and ensuring the correctness of transformations. Utilized DBT snapshots for capturing historical changes in dimensions and facts.
Creating Hive tables and working on them using Hive QL and designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Implemented materialized views in data warehousing environments for pre-aggregated data summaries. Utilized data warehousing features like clustering keys to improve query performance.
Employed Terraform workspaces for managing multiple environments with varying configurations.
Implemented GitLab CI/CD pipeline triggers based on specific branch activities for automated testing and deployment.
Developed Bash scripts for log rotation and retention policies in data processing environments. Implemented Ruby scripts for data validation and integrity checks in ETL workflows.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer

Fannie Mae

09.2019 - 08.2020

Developed and maintained serverless applications using AWS Lambda and AWS Step Functions, improving application performance and scalability.
Developed ETL workflows using Python and Apache Spark, ensuring data quality and consistency across multiple data sources
Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python (spark)
Implemented custom interceptors in Flume for preprocessing log data before ingestion.
Integrated Snowflake with Snowpipe for real-time data ingestion from external sources.
Utilized Snowflake stored procedures for encapsulating complex data manipulation logic.
Developed custom visualizations using Databricks notebooks for data exploration and analysis.
Utilized Databricks Jobs API for programmatically managing and scheduling ETL workflows.
Implemented Azure Data Factory pipelines for orchestrating complex data workflows.
Leveraged Azure Logic Apps connectors for integrating with various Azure and external services.
Integrated Flume with Elasticsearch for real-time log indexing and search capabilities.
Utilized Redshift federated queries to join data across Redshift and external databases.
Automated Redshift snapshots for regular backups and point-in-time recovery.
Implemented RESTful APIs using Flask for seamless communication between web applications.
Developed serverless applications using AWS Lambda for cost-effective and scalable solutions.
Utilized Lambda environment variables for dynamic configuration and parameterization.
Created a Serverless data ingestion pipeline on AWS using lambda functions
Developed Apache Spark Applications by using Scala, Python, and Implemented Apache Spark data processing module to handle data from various RDBMS and Streaming sources
Experience in developing and scheduling various Spark Streaming / batch Jobs using python (pyspark) and Scala
Developing spark code using pyspark to be applying various transformations and actions for faster data processing
Achieved high-throughput, scalable, fault-tolerant stream processing of live data streams using Apache Spark Streaming
Used Spark Stream processing using Scala to get data into in-memory, created RDDs, Data Frames and applied transformations and actions
Sqoop jobs and Hive queries were created for data ingestion from relational databases to analyze historical data
Experience in working with Elastic MapReduce (EMR) and setting up environments on amazon AWS EC2 instances
Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment
Executed Hadoop/Spark jobs on AWS EMR using programs, stored in S3 Buckets
Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API
Utilized AWS CloudWatch to monitor the performance environment instances for operational and performance metrics during load testing
Scripting Hadoop package installation and configuration to support fully automated deployments
Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster
Implemented Oozie workflow for ETL Process for critical data feeds across the platform

Data Engineer

Toyota

01.2018 - 08.2019

Implemented event-driven architectures using AWS services such as S3, Kinesis, and Lambda, enabling real-time data processing and analysis.
Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and configuring launched instances with respect to specific applications
Importing of data from various data sources; perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop
Designed and implemented Snowflake data sharing for secure cross-account data collaboration.
Utilized Snowflake stored procedures for encapsulating complex data manipulation logic.
Implemented Redshift workload management (WLM) to prioritize and optimize query execution.
Utilized Redshift federated queries to join data across Redshift and external databases.
Export the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
Created custom python/shell scripts to import data via SQOOP from Oracle databases
Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs in Scala
Log data collected from the web servers was channeled into HDFS using Flume and spark streaming
Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying
Load and transform Design efficient Spark code using Python and Spark SQL, which can be forward engineered by our code generation developers
Utilized large sets of structured, semi structured, and unstructured data
Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script

Education

Bachelors in Engineering Science -

College of Staten Island

Staten Island, NY

Skills

Flume
Spark (Java/Scala/Python)
PySpark
PyTorch
Fivetran
AWS Kinesis
ETL Development
SSIS (SQL Server Integration Services)
Matillion
AWS Glue
Data Warehousing

Cloud Services:

AWS (Redshift, Lambda, EC2, EMR, S3, Athena)
Azure (Data Factory)

Data Storage and Databases:

PostgreSQL
Git
GitHub
GitLab

Version Control and Collaboration:

GitHub
GitLab
Bitbucket

Web Frameworks:

Flask
Django

Automation and Orchestration:

Ansible
Jenkins
Bamboo

Real-Time Streaming:

Amazon Kinesis
Flume
Snowflake
Amazon DynamoDB
HDFS
S3

Big Data Technologies:

Hadoop
Databricks
Hive
HBASE

Code Infrastructure:

Terraform
Bash Scripting
Ruby Scripting

Business Intelligence and Visualization:

Tableau
DBT (Data Build Tool)
Pandas

Machine Learning:

PyTorch

Other Tools and Services:

AWS Services (Redshift, Lambda, EC2, EMR, S3, Snowflake, Databricks)
SSIS
Azure Data Factory
AWS Athena
AWS EMR
AWS Snowflake

Timeline

Data Engineer

Delta Air Lines

09.2020 - Current

Data Engineer

Fannie Mae

09.2019 - 08.2020

Data Engineer

Toyota

01.2018 - 08.2019

Bachelors in Engineering Science -

College of Staten Island

Hamza Al - US CITIZEN

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Education

Bachelors in Engineering Science -

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Bachelors in Engineering Science -

Similar Profiles

Prasad RPrasad R

Nithya Rajeev KumarNithya Rajeev Kumar

SAINATH MANDADISAINATH MANDADI

Vivek KothaVivek Kotha