Summary
Overview
Work History
Education
Skills
Databaseexpertise
Websites
Cloudtechnologies
Webdevelopment
Reportinganalysis
Datamodelinganalytics
Versioncontrolcicdtools
Softwaredevelopmentlifecycle
Educationcertificates
Technicalstack
Datapipelineetlskills
Bigdataexpertise
Timeline
Generic

MUDASSIR MOHAMMED ABDUL

MANCHEATER,NH

Summary

Over 9 years of diversified IT experience as a Data Engineer, specializing in requirement gathering, design, development, testing, and maintenance of databases, Cloud technologies, data pipelines, and Data Warehouse applications.

Overview

9
9
years of professional experience

Work History

Sr Data Engineer

BCBS
Chicago, IL
08.2023 - Current
  • Worked on data ingestion, cleansing, transformation using AWS Lambda, AWS Glue, and Step Functions
  • Implemented serverless architecture with API Gateway, Lambda, and DynamoDB; deployed Lambda code from S3
  • Developed Glue ETL jobs for data processing, including transformations, and loading data into S3, Redshift, and RDS
  • Automated data storage from streaming sources to AWS data lakes (S3, Redshift, RDS) using AWS Kinesis (Data Firehose)
  • Conducted architecture assessments of AWS services like Amazon EMR, Redshift, and S3
  • Configured CloudWatch for monitoring Lambda functions and Glue Jobs
  • Set up S3 event notifications, SNS topics, SQS queues, and Slack message notifications via Lambda
  • Worked with Spark Streaming and Kafka for real-time processing; combined batch and streaming data
  • Created Kinesis Data Streams, Firehose, and Analytics for data capture, processing, and storage in DynamoDB and Redshift
  • Built Data Lakes, HUB & Data Pipelines using Hadoop, Cloudera, HDFS, MapReduce, Spark, YARN, Delta-lake, Hive
  • Optimized algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frames, Pair RDDs
  • Utilized Apache Nifi for automated data movement; created Hive Load Queries for data loading from HDFS to Netezza
  • Leveraged Snowflake as a cloud-based data warehousing platform to efficiently store and manage large volumes of structured and semi-structured data
  • Utilized Snowflake's scalable architecture to handle growing data volumes and diverse data sources, ensuring the platform could handle the increasing demands of data processing and analytics
  • Integrated Snowflake with various data sources and systems, including databases, data lakes, and third-party applications, to consolidate and centralize data for analysis and reporting
  • Utilized Snowflake's features such as clustering, partitioning, and materialized views to optimize query performance and reduce processing times, ensuring efficient data retrieval for analytical queries
  • Leveraged Snowflake's data-sharing functionality to securely share data with external partners, clients, or other teams within the organization, enabling collaborative data-driven decision-making without the need for data duplication
  • Implemented Dimensional Data Modeling with Star Schema, Snow-Flake Schema, Fact and Dimensional Tables, and Lambda Architecture concepts
  • Developed PL/SQL packages, database triggers, and user procedures; prepared user manuals for new programs
  • Developed Terraform scripts for automated AWS resource deployment, including EC2, S3, EFS, IAM Roles
  • Implemented CI/CD tools like Jenkins and Git Bucket for Python code base repository, build, and deployment
  • Utilized Zookeeper for Spark job scheduling and backup
  • Collaborated with data analysts and business users to gather requirements and design Power BI reports that effectively address business needs
  • Used Parquet files and ORC format with PySpark and Spark Streaming with Data Frames
  • Automated Oozie workflows to manage jobs, including MapReduce, Hive, and Sqoop
  • Environment: Python, AWS, Lambda, Glue, Step Functions, API Gateway, S3, DynamoDB, CloudWatch, SQS, Redshift, RDS, Glue Catalog, Glue Studio, ETL, EC2, EFS, EBS, IAM Roles, Snapshots, Jenkins Server, PL/SQL, Spark, Kafka, Data Lakes, HUB, Data Pipelines, Apache Hadoop, Cloudera, HDFS, MapReduce, YARN, Delta-lake, Hive, Spark-SQL, Spark MLlib, Data Frames, Pair RDDs, Hive Load Queries, Sqoop, Oozie, MapReduce, Data Modeling, Star Schema, Snowflake Snow-Flake Schema, Dimensional Tables, Lambda Architecture, CI/CD, Jenkins, Git Bucket

Data Engineer

US. Bank
Minneapolis, Minnesota
11.2021 - 07.2023
  • Developed and administered data storage solutions, leveraging Big Query, Cloud Storage, and Cloud SQL
  • Conducted ETL and data engineering using Cloud Dataflow, Data Proc, and Google Big Query
  • Architected and implemented data pipelines with Dataflow, Dataproc, and Pub/Sub; processed and loaded data from Pub/Sub topic to Big Query using Cloud Dataflow and Python
  • Utilized G-cloud function with Python to load data into Big Query for incoming CSV files in GCS
  • Hosted applications using Compute Engine, App Engine, Cloud SQL, Kubernetes Engine, and Cloud Storage
  • Monitored and resolved data pipeline and storage issues using Stack driver and Cloud Monitoring
  • Ensured data security and access controls through GCP's IAM and the Cloud Security Command Center
  • Configured GCP services like Data Proc, Storage, and Big Query through Cloud Shell SDK
  • Analyzed Hadoop cluster and utilized Big Data tools including Pig, Hive, HBase, Spark, and Sqoop
  • Developed streaming applications with PySpark to read data from Kafka and persist in NoSQL databases like HBase and Cassandra
  • Utilized Pig and Hive QL for data profiling and aggregation; constructed Spark Applications using Scala and Java for diverse data processing needs
  • Designed and implemented the data model in Neptune; loaded and queried data using the Gremlin query language
  • Conducted data analysis and design activities, creating extensive logical and physical data models and metadata repositories with ERWIN
  • Extensively used Informatica client tools for various ETL management tasks
  • Designed and implemented data pipelines using GCP services for efficient data processing
  • Orchestrated CI/CD pipelines in Jenkins, integrating with various tools through Groovy scripts
  • Managed the creation, execution of API requests, and organized responses using tools like Postman and Swagger, ensuring seamless API collection management
  • Actively ensured data security by implementing access controls and utilizing GCP's security tools
  • Utilized GCP's monitoring tools to maintain data pipeline and storage solution integrity
  • Environment: Python, GCP, Big Query, Cloud Storage, Cloud SQL, ETL, Cloud Dataflow, Cloud Dataproc, Pub/Sub, GCS, Compute Engine, App Engine, Kubernetes Engine, IAM, Cloud Security Command Center, Stack driver, Cloud Monitoring, Cloud Shell SDK, PySpark, Kafka, HBase, Cassandra, Informatica, Source Analyzer, Repository Manager, Server Manager, Workflow Manager, Workflow Monitor, ERWIN, Neptune, Gremlin, Hadoop, Pig, Hive, HBase, Spark, Sqoop, Scala, Java, CI/CD, Jenkins, Groovy, API

Data Engineer

Walmart
Rogers, Arkansas
01.2019 - 10.2021
  • Utilized PySpark to process large volumes of structured, semi-structured, and unstructured data across clusters
  • Leverage Spark's distributed computing framework for parallel processing and ingested data using connectors and APIs from various sources like HDFS, Apache Kafka, and cloud storage
  • Employed command-line utilities such as awk, sed, and grep for data extraction, transformation, filtering, and reporting
  • Leveraged Snowflake's cloud-based platform for efficient storage and management of large data volumes
  • Utilized Snowflake's scalable architecture, SQL-based querying capabilities, and built-in functionalities for data transformations, aggregations, and manipulations
  • Implemented robust security measures, including user roles, permissions, encryption, and access controls
  • Integrated Snowflake with various data sources for consolidation, and utilized data-sharing functionality to enable collaborative decision-making without data duplication
  • Automated ETL processes using AWS Glue for data extraction, transformation, and loading from various sources into target data stores
  • Leveraged Azkaban's capabilities for workflow orchestration, job scheduling, and dependency management
  • Designed and developed ETL processes with AWS Glue to migrate data from sources like S3 into AWS Redshift
  • Loaded data into HDFS from sources like Oracle and DB2 using Sqoop and Hive tables
  • Created AWS Lambda functions and API Gateways for data submission
  • Launched Confidential EC2 Cloud Instances using Linux/Ubuntu and configured them for specific applications
  • Environments: Python, PySpark, HDFS, Apache Kafka, ETL, AWS Lambda, API Gateway, EC2, AWS Glue, S3, Parquet, Redshift, Spring Boot, ORM, Hibernate, Snowflake, SQL, AWS, Linux, Ubuntu, Angular, HTTP

Data Engineer

Cipla
Hyderabad, India
08.2015 - 09.2018
  • Utilized Azure's ETL capabilities and Azure Data Factory (ADF) services to ingest data from various legacy data stores, including SAP (Hana), SFTP servers, and Cloudera Hadoop's HDFS, into Azure Data Lake Storage (Gen2)
  • Performed (Extract Transform and Load) from various source Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Constructed and configured a virtual data center in the Azure cloud to accommodate hosting of an Enterprise Data Warehouse, incorporating components such as Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, and Route Tables
  • Implemented Azure Data Factory and managed policies for Data Factory, leveraging Blob storage for storage and backup on Azure
  • Developed a framework for creating new snapshots and deleting old snapshots in Azure Blob Storage, and configured life cycle policies for backing up data from Delta lakes
  • Created Notebooks using Azure Databricks, Scala, and Spark by utilizing Delta tables & Delta lakes for data capture
  • Implemented Slowly Changing Dimension Type 2 (SCD2) processes, updating or inserting or deleting records based on business requirements using Databricks
  • Designed and developed Business Intelligence solutions that encompass various components such as data modeling, dimension modeling, ETL processes, data integration, OLAP/OLTP and client/server applications
  • Created end-to-end solution for ETL transformation jobs that involve writing Informatica workflows and mappings
  • Involved in the design and development of the web applications by utilizing Python, HTML, CSS, and JavaScript to create visually appealing and interactive user interfaces
  • Utilized Spark Framework with Scala programming language, specifically leveraging Spark Core, Spark Streaming, and Spark SQL modules for efficient and scalable data processing
  • Developed enterprise-level solutions that involve batch processing using Apache Pig and incorporated streaming frameworks such as Spark Streaming, Apache Kafka, and Apache Flink
  • Building, Creating, and Configuring enterprise-level Snowflake environments
  • This involves setting up Snowflake accounts, warehouses, databases, and schema structures and developing the data pipelines on Hadoop using SAS or Pyspark (Spark 2.2 and Spark Sql)
  • Developed common Flink module for serializing and deserializing AVRO data by applying schema
  • Established remote integration with third-party platforms by leveraging RESTful web services and utilized tools like Postman for API testing
  • Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server 2012, and Flat files
  • Implemented Python PostgreSQL forms to capture and store data from various sources and performed tasks such as graphics generation, XML processing, data exchange, and business logic implementation
  • Automated the building of Azure infrastructure using Terraform and Azure Cloud Formation
  • Designed and developed Tableau visualizations which include preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies
  • Worked with various file formats, including flat-files, Text, and CSV, as well as compressed formats like Parquet
  • Responsible for the installation and maintenance of web servers, including Tomcat and Apache HTTP, in UNIX environments
  • Environments: Azure, ETL, ADF, SAP, Hana, SFTP, Cloudera, HDFS, Data Lake Storage, Gen2, T-SQL, Spark SQL, U-SQL, Data Lake Analytics, VPC, Public Subnets, Private Subnets, Security Groups, Route Tables, Blob Storage, Delta tables, Delta lakes, SCD2, Business Intelligence, OLAP, OLTP, ETL transformation, Informatica, HTML, CSS, JavaScript, Scala, Spark Core, Spark Streaming, Spark SQL, Apache Pig, Apache Kafka, Apache Flink, Snowflake, SAS, Pyspark, RESTful, Postman, SSIS, JSON, PostgreSQL, Terraform, Azure Cloud Formation, Tableau, Dashboards, CSV, Parquet, Tomcat, Apache HTTP, UNIX

Education

Bachelor’s - Computer Science

JNTUH
01.2013

Skills

  • Experience in core Java, J2EE, Multithreading, JDBC, Shell Scripting, Java API's Collections, Servlets, JSP

Databaseexpertise

Well-versed in RDBMS like Oracle, MS SQL Server, MYSQL, Teradata, DB2, Netezza, PostgreSQL, MS Access; Exposure to NoSQL databases such as MongoDB, HBase, DynamoDB, and Cassandra.

Cloudtechnologies

Hands-on experience with Azure (including Azure Data Factory, Data Lake Storage, Synapse Analytics, Cosmos NO SQL DB), GCP (including Big Query, GCS, Cloud functions, Dataflow, Pub/Sub, Data Proc), and AWS (including EC2, Glue, Lambda, SNS, S3, RDS, Cloud Watch, VPC, Elastic Beanstalk, Auto Scaling, Redshift).

Webdevelopment

Experience in developing web applications using Python, Pyspark, Django, C++, XML, CSS, HTML, JavaScript, and jQuery.

Reportinganalysis

Proficient in developing business reports with Power BI, Tableau, SQL Server Reporting Service (SSRS), analysis using SQL Server Analysis Service (SSAS), and ETL processes using SQL SERVER Integration Service (SSIS). Good handling of complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT.

Datamodelinganalytics

Adaptable in using Data Modeling packages like NumPy, SciPy, Pandas, Beautiful Soup, Scikit-Learn, Matplotlib, Seaborn in Python, and Dplyr, TidyR, ggplot2 in R. Knowledge of OLAP/OLTP, Dimensional Data Modeling with Ralph Kimball Methodology.

Versioncontrolcicdtools

Well-versed with tools like SVN, GIT, SourceTree, Bitbucket, and experience with Unix/Linux commands, scripting, and deployment on servers.

Softwaredevelopmentlifecycle

Involved in all phases, including Agile, Scrum, and Waterfall management processes, focusing on high availability, fault tolerance, auto-scaling, and query optimization techniques.

Educationcertificates

  • Bachelor’s in Computer Science, JNTUH, 2013
  • AWS Certified Solutions Architect Associate

Technicalstack

Python, R, SQL, Java, .Net, HTML, CSS, Scala, Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Django, Flask, Pyramid, PyCharm, Sublime Text, REST, SOAP, Microservices, HTML, CSS, JavaScript, MVW, MVC, Oracle, PostgreSQL, Teradata, IBM DB2, MySQL, PL/SQL, MongoDB, Cassandra, DynamoDB, HBase, WAMP, LAMP, Cloudera distribution, Hortonworks Ambari, HDFS, Map Reduce, YARN, Pig, Sqoop, HBase, Hive, Flume, Cassandra, Apache Spark, Oozie, Zookeeper, Hadoop, Scala, Impala, Kafka, Airflow, DBT, NiFi, Power BI, SSIS, SSAS, SSRS, Tableau, Kubernetes, Docker, Docker Registry, Docker Hub, Docker Swarm, EC2, S3, RDS, VPC, IAM, Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, Step Functions, Cloud transformations, EMR, Big Query, GCS Bucket, G-Cloud function, Cloud Dataflow, Pub/Sub, Cloud Shell, GSUTIL, BQ command line utilities, Data Proc, Web application, App services, Storage, SQL Database, Virtual machines, Search, Notification Hub, Relational data modeling, ER/Studio, Erwin, Sybase Power Designer, Star Join Schema, Snowflake modeling, FACT and Dimensions tables, Kinesis, Kafka, Flume, Concurrent Versions System (CVS), Subversion (SVN), GIT, GitHub, Mercurial, Bit Bucket, Docker, Kubernetes

Datapipelineetlskills

Proficient in building data pipelines using Python/Pyspark/Hive SQL/Presto/Big Query and Apache Airflow. Experienced in using Teradata utilities, Informatica client tools, Sqoop for data import/export, and Flume and NiFi for log file loading.

Bigdataexpertise

Extensive hands-on experience in Hadoop architecture and various components, SPARK applications (RDD transformations, Spark core, MLlib, Streaming, SQL), Cloudera ecosystem (HDFS, Yarn, Hive, Sqoop, Flume, HBase, Oozie, Kafka, Pig), data pipeline development, and data analysis with Hive SQL, Impala, Spark, and Spark SQL.

Timeline

Sr Data Engineer

BCBS
08.2023 - Current

Data Engineer

US. Bank
11.2021 - 07.2023

Data Engineer

Walmart
01.2019 - 10.2021

Data Engineer

Cipla
08.2015 - 09.2018

Bachelor’s - Computer Science

JNTUH
MUDASSIR MOHAMMED ABDUL