Summary

Overview

Work History

Education

Skills

Timeline

Zimpa Baiji

Irving,TX

Summary

Over 5+ years of experience in Data Engineering, Data Pipeline Design, Development, and Implementation as a Data Engineer/Data Developer and Data Modeler. Strong experience in Software Development Life Cycle (SDLC) including requirements Analysis, Design Specification, and Testing as per Cycle in both Waterfall and Agile methodologies. Experience developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming, and Spark SQL. Experience refactoring the existing spark batch process for different logs written in Scala. Experience in Hadoop Ecosystem components Map - Reduce, HDFS, Yarn/MRv2, Hive, HDFS, HBase, Spark, Kafka, Sqoop, Flume, Avro, Sqoop, AWS, Avro, Solr and Zookeeper. Experience developing applications using Map Reduce to analyze Big Data with different file formats. Experience with Apache Spark components including SPARK CORE, SPARK SQL, SPARK STREAMING, and SPARK MLLIB. Experience in data analysis using Hive, and Impala. Experience working on creating and running Docker images with multiple micro-services. Experience in Data Modeling and ETL processes in data warehouse environments such as star schema, and snowflake schema. Experience structural modifications using Map-Reduce, and Hive and analyze data using visualization/reporting tools (Tableau). Experience working with GitHub/Git source and version control systems. Experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HD Insight Big Data Technologies (Hadoop and Apache Spark), and Data bricks. Experience designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure. Experience in Amazon Web Services (AWS) concepts like EC2, S3, EMR, ElasticCache, DynamoDB, Redshift, Aurora. Experience developing scripts using Python or Shell Scrips to Extract, Load, and Transform. Experience in developing JSON scripts for deploying the pipeline in Azure Data Factory (ADF), which processes the data using the Cosmos Activity. Experienced in code repositories like GitHub. Experience in using SQOOP to import and export data from RDBMS to HDFS and Hive. Experience in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server. Experience in agile software development methodology. Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills. Excellent communication skills, interpersonal skills, problem-solving skills a very good team player along with a can-do attitude and ability to effectively communicate with all levels of the organization such as technical, management, and customers.

Overview

years of professional experience

Work History

Data Engineer

Verizon

07.2023 - Current

Involved in Analysis, Design, and Implementation/translation of Business User requirements
Responsible for building confidential data cube using the SPARK framework by writing Spark SQL queries in Scala to improve data processing efficiency and reporting query response time
Developed Spark code using Scala and Spark-SQL for faster testing and data processing
Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data back into the OLTP system through Sqoop
Developed Spark jobs using PySpark and Scala to create a generic framework to process files such as JSON, text, and CSV
Developed business logic using Kafka Direct Stream in Spark streaming and implemented business transformations
Involved in developing transformations using Python
These transformations help convert JSON into a structured relational data format and apply desired logic or conditions
Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, database staging, and the extract creation
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions, and Table calculations
Responsible for writing Map-Reduce job which joins the incoming slices of data and picks only the fields needed for further processing
Used Apache Kafka to aggregate web log data from multiple servers and make them available in downstream systems for analysis
Used Kafka Streams to configure Spark streaming to get information and then store it in HDFS
Created dashboards for analyzing POS data using Power BI
Implemented data ingestion and handling clusters in real-time processing using Kafka
Involved in developing DAGS using the Airflow orchestration tool and monitored the weekly processes
Designed various Azure data factory pipelines to pull data from various data sources and load the data into Azure SQL database
Used stored procedures, lookup, execute the pipeline, data flow, copy data, and Azure function features in ADF
Worked on Azure data bricks, PySpark, Spark SQL, Azure ADW, and Hive used to load and transform data
Used Azure Data Lake as a Source and pulled data using Azure Polybase
Performed data cleaning and preparation of XML files
Created Hive tables and dynamic partitions, with buckets for sampling and working on them using Hive QL
Created HBase tables to store variable data formats from different portfolios
Used SQL queries and other tools to perform data analysis and profiling
Implemented Agile Methodology for building the data applications and framework development
Actively participated and provided constructive and insightful feedback during weekly Iterative review meetings to track the progress for each iterative cycle and figure out the issues
Environment: Spark, Scala, ETL, Kafka, Tableau, Hadoop, Python, Snowflake, HDFS, Hive, MapReduce, PySpark, Docker, Sqoop, Azure, Teradata, JSON, MongoDB, SQL, Agile, and Windows

Data Engineer

Delta Airlines

02.2023 - 07.2023

Involved in the Requirement gathering phase to gather the requirements from the business users to accommodate changing user requirements continuously
Developed Spark programs with Scala and applied principles of functional programming to do batch processing
Used various Spark Transformations and Actions for cleansing the input data and was involved in using the Spark application master to monitor the Spark jobs and capture the logs for the Spark jobs
Used Spark Data frames, and Spark-SQL extensively to build multiple ETL pipelines
Used Pyspark for data ingestion and perform complex transformations
Develop quality code adhering to Scala coding Standards and best practices
Analyzed large data sets to determine the optimal way to aggregate and report on them using Map Reduce programs
Responsible for data services and data movement infrastructures, worked with ETL concepts, building ETL solutions and Data modeling
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date
Designed, developed, and implemented ETL pipelines using Python API (PySpark) of Apache Spark on AWS EMR
Development and implementation of several types of sub-reports, drill-down reports, summary reports, parameterized reports, and ad-hoc reports using Tableau
Developed interactive dashboards and reports using Power BI for day-to-day business decision-making and strategic planning needs
Created Airflow Scheduling scripts in Python
Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
Used Apache Kafka to aggregate web log data from multiple servers and make them available for analysis in downstream systems
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
Implemented Copy activity, Custom Azure Data Factory Pipeline Activities
Designed, developed, implemented, and maintained solutions for using Docker, Jenkins, and Git, for microservices and continuous deployment
Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
Involved in creating Hive tables, loading and analyzing data using Hive queries, and writing complex Hive queries to transform the data
Created HBase tables to load large sets of structured data
Used SQL queries and other tools to perform data analysis and profiling
Involved in Agile methodologies, daily scrum meetings, and spring planning
Actively participated and provided feedback constructively and insightfully during weekly Iterative review meetings to track the progress for each iterative cycle and figure out the issues
Environment: Spark, Scala, ETL, Hadoop, Python, Snowflake, HDFS, Hive, Tableau, MapReduce, PySpark, Tableau, Teradata, Docker, JSON, XML, Azure, Apache Kafka, SQL, PL/SQL, Agile and Windows

Data Engineer

Edmund Optics

09.2020 - 08.2022

Participated in requirement-gathering sessions with business users and sponsors to understand and document the business requirements
Involved in developing Spark programs with Python, and applied principles of functional programming to process the complex structured data sets
Developed Spark scripts by using Scala as required to read/write JSON files
Develop near real-time data pipeline using spark
Wrote functions whenever required to make column validations, and data cleansing as required to achieve logics in Scala
Worked on storing the data frame in the hive as a table using Python (PySpark)
Developed ETL jobs using Python for various consumers
This involved creating Python modules to parse JSON files, extract data from relational tables, and flat files, and perform transformations
Optimized current pivot tables' reports using Tableau and proposed an expanded set of views in the form of interactive dashboards using line graphs, bar charts, heat maps, tree maps, trend analysis, Pareto charts, and bubble charts to enhance data analysis
Developed Hive queries to process the data for visualizing and worked on tuning the performance of Hive Queries
Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data
Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway
Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala
Created DataStage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc
Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python
Extracted the data from Teradata into HDFS using the Sqoop
Developed PIG UDF to manipulate the data according to business requirements and also worked on developing custom PIG loaders
Wrote Pig scripts to process unstructured data and create structured data for use with Hive
Involved in converting Hive/SQL queries into Spark transformations using Spark data frame in Python
Performed data cleaning and preparation of XML files
Worked with JSON, CSV, Sequential, and Text file formats
Involved in creating, and modifying SQL queries, prepared statements, and stored procedures used by the application
Participated in the status meetings and status updates to the management team
Environment: Spark, Scala, Hadoop, Python, Pyspark, AWS, MapReduce, Pig, ETL, HDFS, Hive, HBase, SQL, Agile and Windows

Data Engineer

Verisk

01.2018 - 08.2020

Worked with the business users to gather, define business requirements, and analyze the possible technical solutions
Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it
Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
Responsible for building the ETL Pipelines (Extract, Transform, and Load) from Data Lake to different databases based on the requirements
Utilized AWS services with a focus on big data architect /analytics/enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, and performance, and to provide meaningful and valuable information for better decision-making
Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata, and Snowflake
Developed PIG scripts for the analysis of semi-structured data
Used Pig as an ETL tool to do transformations, event joins, filters, and some pre-aggregations before storing the data onto HDFS
Extensively involved in writing SQL queries (sub-queries and join conditions) for building and testing ETL processes
Actively participating in the code reviews, and meetings and solving any technical issues
Environment: Spark, Scala, Hive, JSON, AWS, MapReduce, Hadoop, Python, XML, NoSQL, HBase, and Windows

Education

Master of Arts - Information Technology Management

Webster University

St Louis, MO

12.2023

Bachelor’s in management - Bachelor in Travel And Tourism Studies

Kathmandu Academy of Tourism and Hospitality

01.2020

Skills

Python
SQL
Scala
MATLAB
Red Hat Linux
Unix
Windows
MacOS
Snowflake

Teradata
Oracle
MySQL
Microsoft SQL
Postgre SQL
Azure
AWS
Docker

Timeline

Data Engineer

Verizon

07.2023 - Current

Data Engineer

Delta Airlines

02.2023 - 07.2023

Data Engineer

Edmund Optics

09.2020 - 08.2022

Data Engineer

Verisk

01.2018 - 08.2020

Bachelor’s in management - Bachelor in Travel And Tourism Studies

Kathmandu Academy of Tourism and Hospitality

Master of Arts - Information Technology Management

Webster University

Zimpa Baiji

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Master of Arts - Information Technology Management

Bachelor’s in management - Bachelor in Travel And Tourism Studies

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Bachelor’s in management - Bachelor in Travel And Tourism Studies

Master of Arts - Information Technology Management

Similar Profiles

Frederick HeseltonFrederick Heselton

Celissia HolmesCelissia Holmes

Lance MillsLance Mills

Matthew DilgerMatthew Dilger

Yoseline MartinezYoseline Martinez