Summary

Overview

Work History

Education

Skills

Databaseexpertise

Websites

Cloudtechnologies

Webdevelopment

Reportinganalysis

Datamodelinganalytics

Versioncontrolcicdtools

Softwaredevelopmentlifecycle

Educationcertificates

Technicalstack

Datapipelineetlskills

Bigdataexpertise

Timeline

MUDASSIR MOHAMMED ABDUL

MANCHEATER,NH

Summary

Over 9 years of diversified IT experience as a Data Engineer, specializing in requirement gathering, design, development, testing, and maintenance of databases, Cloud technologies, data pipelines, and Data Warehouse applications.

Overview

years of professional experience

Work History

Sr Data Engineer

BCBS

Chicago, IL

08.2023 - Current

Worked on data ingestion, cleansing, transformation using AWS Lambda, AWS Glue, and Step Functions
Implemented serverless architecture with API Gateway, Lambda, and DynamoDB; deployed Lambda code from S3
Developed Glue ETL jobs for data processing, including transformations, and loading data into S3, Redshift, and RDS
Automated data storage from streaming sources to AWS data lakes (S3, Redshift, RDS) using AWS Kinesis (Data Firehose)
Conducted architecture assessments of AWS services like Amazon EMR, Redshift, and S3
Configured CloudWatch for monitoring Lambda functions and Glue Jobs
Set up S3 event notifications, SNS topics, SQS queues, and Slack message notifications via Lambda
Worked with Spark Streaming and Kafka for real-time processing; combined batch and streaming data
Created Kinesis Data Streams, Firehose, and Analytics for data capture, processing, and storage in DynamoDB and Redshift
Built Data Lakes, HUB & Data Pipelines using Hadoop, Cloudera, HDFS, MapReduce, Spark, YARN, Delta-lake, Hive
Optimized algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frames, Pair RDDs
Utilized Apache Nifi for automated data movement; created Hive Load Queries for data loading from HDFS to Netezza
Leveraged Snowflake as a cloud-based data warehousing platform to efficiently store and manage large volumes of structured and semi-structured data
Utilized Snowflake's scalable architecture to handle growing data volumes and diverse data sources, ensuring the platform could handle the increasing demands of data processing and analytics
Integrated Snowflake with various data sources and systems, including databases, data lakes, and third-party applications, to consolidate and centralize data for analysis and reporting
Utilized Snowflake's features such as clustering, partitioning, and materialized views to optimize query performance and reduce processing times, ensuring efficient data retrieval for analytical queries
Leveraged Snowflake's data-sharing functionality to securely share data with external partners, clients, or other teams within the organization, enabling collaborative data-driven decision-making without the need for data duplication
Implemented Dimensional Data Modeling with Star Schema, Snow-Flake Schema, Fact and Dimensional Tables, and Lambda Architecture concepts
Developed PL/SQL packages, database triggers, and user procedures; prepared user manuals for new programs
Developed Terraform scripts for automated AWS resource deployment, including EC2, S3, EFS, IAM Roles
Implemented CI/CD tools like Jenkins and Git Bucket for Python code base repository, build, and deployment
Utilized Zookeeper for Spark job scheduling and backup
Collaborated with data analysts and business users to gather requirements and design Power BI reports that effectively address business needs
Used Parquet files and ORC format with PySpark and Spark Streaming with Data Frames
Automated Oozie workflows to manage jobs, including MapReduce, Hive, and Sqoop
Environment: Python, AWS, Lambda, Glue, Step Functions, API Gateway, S3, DynamoDB, CloudWatch, SQS, Redshift, RDS, Glue Catalog, Glue Studio, ETL, EC2, EFS, EBS, IAM Roles, Snapshots, Jenkins Server, PL/SQL, Spark, Kafka, Data Lakes, HUB, Data Pipelines, Apache Hadoop, Cloudera, HDFS, MapReduce, YARN, Delta-lake, Hive, Spark-SQL, Spark MLlib, Data Frames, Pair RDDs, Hive Load Queries, Sqoop, Oozie, MapReduce, Data Modeling, Star Schema, Snowflake Snow-Flake Schema, Dimensional Tables, Lambda Architecture, CI/CD, Jenkins, Git Bucket

Data Engineer

US. Bank

Minneapolis, Minnesota

11.2021 - 07.2023

Developed and administered data storage solutions, leveraging Big Query, Cloud Storage, and Cloud SQL
Conducted ETL and data engineering using Cloud Dataflow, Data Proc, and Google Big Query
Architected and implemented data pipelines with Dataflow, Dataproc, and Pub/Sub; processed and loaded data from Pub/Sub topic to Big Query using Cloud Dataflow and Python
Utilized G-cloud function with Python to load data into Big Query for incoming CSV files in GCS
Hosted applications using Compute Engine, App Engine, Cloud SQL, Kubernetes Engine, and Cloud Storage
Monitored and resolved data pipeline and storage issues using Stack driver and Cloud Monitoring
Ensured data security and access controls through GCP's IAM and the Cloud Security Command Center
Configured GCP services like Data Proc, Storage, and Big Query through Cloud Shell SDK
Analyzed Hadoop cluster and utilized Big Data tools including Pig, Hive, HBase, Spark, and Sqoop
Developed streaming applications with PySpark to read data from Kafka and persist in NoSQL databases like HBase and Cassandra
Utilized Pig and Hive QL for data profiling and aggregation; constructed Spark Applications using Scala and Java for diverse data processing needs
Designed and implemented the data model in Neptune; loaded and queried data using the Gremlin query language
Conducted data analysis and design activities, creating extensive logical and physical data models and metadata repositories with ERWIN
Extensively used Informatica client tools for various ETL management tasks
Designed and implemented data pipelines using GCP services for efficient data processing
Orchestrated CI/CD pipelines in Jenkins, integrating with various tools through Groovy scripts
Managed the creation, execution of API requests, and organized responses using tools like Postman and Swagger, ensuring seamless API collection management
Actively ensured data security by implementing access controls and utilizing GCP's security tools
Utilized GCP's monitoring tools to maintain data pipeline and storage solution integrity
Environment: Python, GCP, Big Query, Cloud Storage, Cloud SQL, ETL, Cloud Dataflow, Cloud Dataproc, Pub/Sub, GCS, Compute Engine, App Engine, Kubernetes Engine, IAM, Cloud Security Command Center, Stack driver, Cloud Monitoring, Cloud Shell SDK, PySpark, Kafka, HBase, Cassandra, Informatica, Source Analyzer, Repository Manager, Server Manager, Workflow Manager, Workflow Monitor, ERWIN, Neptune, Gremlin, Hadoop, Pig, Hive, HBase, Spark, Sqoop, Scala, Java, CI/CD, Jenkins, Groovy, API

Data Engineer

Walmart

Rogers, Arkansas

01.2019 - 10.2021

Utilized PySpark to process large volumes of structured, semi-structured, and unstructured data across clusters
Leverage Spark's distributed computing framework for parallel processing and ingested data using connectors and APIs from various sources like HDFS, Apache Kafka, and cloud storage
Employed command-line utilities such as awk, sed, and grep for data extraction, transformation, filtering, and reporting
Leveraged Snowflake's cloud-based platform for efficient storage and management of large data volumes
Utilized Snowflake's scalable architecture, SQL-based querying capabilities, and built-in functionalities for data transformations, aggregations, and manipulations
Implemented robust security measures, including user roles, permissions, encryption, and access controls
Integrated Snowflake with various data sources for consolidation, and utilized data-sharing functionality to enable collaborative decision-making without data duplication
Automated ETL processes using AWS Glue for data extraction, transformation, and loading from various sources into target data stores
Leveraged Azkaban's capabilities for workflow orchestration, job scheduling, and dependency management
Designed and developed ETL processes with AWS Glue to migrate data from sources like S3 into AWS Redshift
Loaded data into HDFS from sources like Oracle and DB2 using Sqoop and Hive tables
Created AWS Lambda functions and API Gateways for data submission
Launched Confidential EC2 Cloud Instances using Linux/Ubuntu and configured them for specific applications
Environments: Python, PySpark, HDFS, Apache Kafka, ETL, AWS Lambda, API Gateway, EC2, AWS Glue, S3, Parquet, Redshift, Spring Boot, ORM, Hibernate, Snowflake, SQL, AWS, Linux, Ubuntu, Angular, HTTP

Data Engineer

Cipla

Hyderabad, India

08.2015 - 09.2018

Utilized Azure's ETL capabilities and Azure Data Factory (ADF) services to ingest data from various legacy data stores, including SAP (Hana), SFTP servers, and Cloudera Hadoop's HDFS, into Azure Data Lake Storage (Gen2)
Performed (Extract Transform and Load) from various source Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Constructed and configured a virtual data center in the Azure cloud to accommodate hosting of an Enterprise Data Warehouse, incorporating components such as Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, and Route Tables
Implemented Azure Data Factory and managed policies for Data Factory, leveraging Blob storage for storage and backup on Azure
Developed a framework for creating new snapshots and deleting old snapshots in Azure Blob Storage, and configured life cycle policies for backing up data from Delta lakes
Created Notebooks using Azure Databricks, Scala, and Spark by utilizing Delta tables & Delta lakes for data capture
Implemented Slowly Changing Dimension Type 2 (SCD2) processes, updating or inserting or deleting records based on business requirements using Databricks
Designed and developed Business Intelligence solutions that encompass various components such as data modeling, dimension modeling, ETL processes, data integration, OLAP/OLTP and client/server applications
Created end-to-end solution for ETL transformation jobs that involve writing Informatica workflows and mappings
Involved in the design and development of the web applications by utilizing Python, HTML, CSS, and JavaScript to create visually appealing and interactive user interfaces
Utilized Spark Framework with Scala programming language, specifically leveraging Spark Core, Spark Streaming, and Spark SQL modules for efficient and scalable data processing
Developed enterprise-level solutions that involve batch processing using Apache Pig and incorporated streaming frameworks such as Spark Streaming, Apache Kafka, and Apache Flink
Building, Creating, and Configuring enterprise-level Snowflake environments
This involves setting up Snowflake accounts, warehouses, databases, and schema structures and developing the data pipelines on Hadoop using SAS or Pyspark (Spark 2.2 and Spark Sql)
Developed common Flink module for serializing and deserializing AVRO data by applying schema
Established remote integration with third-party platforms by leveraging RESTful web services and utilized tools like Postman for API testing
Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server 2012, and Flat files
Implemented Python PostgreSQL forms to capture and store data from various sources and performed tasks such as graphics generation, XML processing, data exchange, and business logic implementation
Automated the building of Azure infrastructure using Terraform and Azure Cloud Formation
Designed and developed Tableau visualizations which include preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies
Worked with various file formats, including flat-files, Text, and CSV, as well as compressed formats like Parquet
Responsible for the installation and maintenance of web servers, including Tomcat and Apache HTTP, in UNIX environments
Environments: Azure, ETL, ADF, SAP, Hana, SFTP, Cloudera, HDFS, Data Lake Storage, Gen2, T-SQL, Spark SQL, U-SQL, Data Lake Analytics, VPC, Public Subnets, Private Subnets, Security Groups, Route Tables, Blob Storage, Delta tables, Delta lakes, SCD2, Business Intelligence, OLAP, OLTP, ETL transformation, Informatica, HTML, CSS, JavaScript, Scala, Spark Core, Spark Streaming, Spark SQL, Apache Pig, Apache Kafka, Apache Flink, Snowflake, SAS, Pyspark, RESTful, Postman, SSIS, JSON, PostgreSQL, Terraform, Azure Cloud Formation, Tableau, Dashboards, CSV, Parquet, Tomcat, Apache HTTP, UNIX

Education

Bachelor’s - Computer Science

JNTUH

01.2013

Skills

Experience in core Java, J2EE, Multithreading, JDBC, Shell Scripting, Java API's Collections, Servlets, JSP

Databaseexpertise

Well-versed in RDBMS like Oracle, MS SQL Server, MYSQL, Teradata, DB2, Netezza, PostgreSQL, MS Access; Exposure to NoSQL databases such as MongoDB, HBase, DynamoDB, and Cassandra.

Websites

http://www.linkedin.com/in/mudassir-ma-6b8708341

Cloudtechnologies

Hands-on experience with Azure (including Azure Data Factory, Data Lake Storage, Synapse Analytics, Cosmos NO SQL DB), GCP (including Big Query, GCS, Cloud functions, Dataflow, Pub/Sub, Data Proc), and AWS (including EC2, Glue, Lambda, SNS, S3, RDS, Cloud Watch, VPC, Elastic Beanstalk, Auto Scaling, Redshift).

Webdevelopment

Experience in developing web applications using Python, Pyspark, Django, C++, XML, CSS, HTML, JavaScript, and jQuery.

Reportinganalysis

Proficient in developing business reports with Power BI, Tableau, SQL Server Reporting Service (SSRS), analysis using SQL Server Analysis Service (SSAS), and ETL processes using SQL SERVER Integration Service (SSIS). Good handling of complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT.

Datamodelinganalytics

Adaptable in using Data Modeling packages like NumPy, SciPy, Pandas, Beautiful Soup, Scikit-Learn, Matplotlib, Seaborn in Python, and Dplyr, TidyR, ggplot2 in R. Knowledge of OLAP/OLTP, Dimensional Data Modeling with Ralph Kimball Methodology.

Versioncontrolcicdtools

Well-versed with tools like SVN, GIT, SourceTree, Bitbucket, and experience with Unix/Linux commands, scripting, and deployment on servers.

Softwaredevelopmentlifecycle

Involved in all phases, including Agile, Scrum, and Waterfall management processes, focusing on high availability, fault tolerance, auto-scaling, and query optimization techniques.

Educationcertificates

Bachelor’s in Computer Science, JNTUH, 2013
AWS Certified Solutions Architect Associate

Technicalstack

Python, R, SQL, Java, .Net, HTML, CSS, Scala, Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Django, Flask, Pyramid, PyCharm, Sublime Text, REST, SOAP, Microservices, HTML, CSS, JavaScript, MVW, MVC, Oracle, PostgreSQL, Teradata, IBM DB2, MySQL, PL/SQL, MongoDB, Cassandra, DynamoDB, HBase, WAMP, LAMP, Cloudera distribution, Hortonworks Ambari, HDFS, Map Reduce, YARN, Pig, Sqoop, HBase, Hive, Flume, Cassandra, Apache Spark, Oozie, Zookeeper, Hadoop, Scala, Impala, Kafka, Airflow, DBT, NiFi, Power BI, SSIS, SSAS, SSRS, Tableau, Kubernetes, Docker, Docker Registry, Docker Hub, Docker Swarm, EC2, S3, RDS, VPC, IAM, Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, Step Functions, Cloud transformations, EMR, Big Query, GCS Bucket, G-Cloud function, Cloud Dataflow, Pub/Sub, Cloud Shell, GSUTIL, BQ command line utilities, Data Proc, Web application, App services, Storage, SQL Database, Virtual machines, Search, Notification Hub, Relational data modeling, ER/Studio, Erwin, Sybase Power Designer, Star Join Schema, Snowflake modeling, FACT and Dimensions tables, Kinesis, Kafka, Flume, Concurrent Versions System (CVS), Subversion (SVN), GIT, GitHub, Mercurial, Bit Bucket, Docker, Kubernetes

Datapipelineetlskills

Proficient in building data pipelines using Python/Pyspark/Hive SQL/Presto/Big Query and Apache Airflow. Experienced in using Teradata utilities, Informatica client tools, Sqoop for data import/export, and Flume and NiFi for log file loading.

Bigdataexpertise

Extensive hands-on experience in Hadoop architecture and various components, SPARK applications (RDD transformations, Spark core, MLlib, Streaming, SQL), Cloudera ecosystem (HDFS, Yarn, Hive, Sqoop, Flume, HBase, Oozie, Kafka, Pig), data pipeline development, and data analysis with Hive SQL, Impala, Spark, and Spark SQL.

Timeline

Sr Data Engineer

BCBS

08.2023 - Current

Data Engineer

US. Bank

11.2021 - 07.2023

Data Engineer

Walmart

01.2019 - 10.2021

Data Engineer

Cipla

08.2015 - 09.2018

Bachelor’s - Computer Science

JNTUH

MUDASSIR MOHAMMED ABDUL

Summary

Overview

Work History

Sr Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Bachelor’s - Computer Science

Skills

Databaseexpertise

Websites

Cloudtechnologies

Webdevelopment

Reportinganalysis

Datamodelinganalytics

Versioncontrolcicdtools

Softwaredevelopmentlifecycle

Educationcertificates

Technicalstack

Datapipelineetlskills

Bigdataexpertise

Timeline

Sr Data Engineer

Data Engineer

Data Engineer

Data Engineer

Bachelor’s - Computer Science

Similar Profiles

Lynda Powell-WebbLynda Powell-Webb

Ashley DunawayAshley Dunaway

Stacey LipstrawStacey Lipstraw

Jeanne HollidayJeanne Holliday

Susan WoodsSusan Woods