Summary

Overview

Work History

Education

Skills

Certification

Timeline

Mohana

Miamisburg,OH

Summary

Dynamic Lead Data Engineer with 12 Years track record of spearheading diverse teams to design and implement innovative data solutions. Leveraged expertise in Scala/Python, Apache Spark, and cloud technologies to enhance data integration and analytics. Renowned for analytical thinking and an ownership mindset, consistently delivering projects that meet rigorous data governance standards.

Overview

years of professional experience

Certification

Work History

Data Engineer

PayPal

03.2022 - 09.2024

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team
Design and architect various layers of Data Lake
Design star schema in Big Query
Loading salesforce Data every 15 min on incremental basis to Big Query raw and UDM layer using SOQL, Google Dataproc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil and Shell Script
Using rest API with Python to ingest Data from and some other site to Big Query
Build a program with Python and Apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Big Query tables
Building a Scala and spark based configurable framework to connect common Data sources like MySQL, Oracle, Postgres, SQL Server, Salesforce, Big Query and load it in Big Query
Monitoring Big Query, Dataproc and cloud Data flow jobs via Stack driver for all the environments
Open SSH tunnel to Google Dataproc to access to yarn manager to monitor spark jobs
Submit spark jobs using gsutil and spark submission get it executed in Dataproc cluster
Wrote a Python program to maintain raw file archival in GCS bucket
Worked on setting up secure file transfers between Linux servers and external systems using SFTP protocols and used LFTP and WinSCP to facilitate these transfers and ensure reliable and efficient data exchange
Using g-cloud function with Python to load Data into Big Query for on arrival csv files in GCS bucket
Process and load bound and unbound Data from Google pub/subtopic to Big Query using cloud Dataflow with Python
Create firewall rules to access Google Dataproc from other machines
Analyze various types of raw file like Json, Cs, Xml with Python using Pandas, NumPy etc
Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
Unit Testing and Data Validations by running basic to complex SQL Queries
Automated Data Validations using Python as a programming language
Performance Tuning of long running Spark SQL Queries using techniques such as Partitioning
Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets
Used Git for version control with colleagues
Actively worked with business to translate requirements to technical specifications and coordinate with offshore teams
ETL pipelines were used for data analytics and processing.

Sr. Data Engineer

FIS

02.2021 - 03.2022

Developed and maintained end-to-end ETL or Data engineering pipelines to process large scale data using Azure as Cloud platform
Developed Data Engineering Pipelines utilizing Python, Spark, Databricks, Airflow, and other technologies
Researched and implemented various components like pipeline, activity, mapping data flows, data sets, linked services, triggers and control flow
Built ETL pipelines using Azure Data factory with Azure SQL as source and Azure Synapse Analytics / Snowflake as Data Warehouse
Worked on building ETL pipelines using Databricks with Azure Data Lake storage (Gen 2) as source and Azure Cosmos DB as Destination
Implemented and maintained Neo4j databases to manage and query highly connected data structures efficiently
Ingested Data into SQL pools from various Sources like ADLS, Azure SQL database and On-Premises sources
Performed transformations such as Aggregating, Filtering and Joining datasets using SQL pools to manipulate the data
Created ADF pipelines which will run Dynamically using parameters and used Filter and Aggregate Transformations to transform the data as per the requirement
Extensively worked on Data Analysis using SQL based queries in Azure SQL, Azure Databricks and Azure Synapse Analytics
Worked on Building ADF data flow using Azure SQL and Azure Synapse analytics
Orchestrate complex data pipelines using ADF pipelines
The pipeline contains activities based on Azure Data Copy, ADF Dataflow (with Azure SQL as source and Azure Synapse Analytics as target), for Each for Baseline etc
Performance Tuning of ADF dataflows and Pipelines using Custom integration Runtimes and reducing the shuffle partitions
Designed and Implemented end to end Data Engineering Pipelines for the ERP Solutions and Retail Solutions using ADF Data Flow and Spark on Azure Databricks
Handle Ingestion of data from various data sources into Azure Storage using ADF Data Flows
Created a PySpark-based application to convert data from one format to another, like CSV to Parquet
Designed and developed applications using PySpark to read the CSV files and dynamically created the tables in Azure data storage
Implemented ETL Logic using PySpark and Spark SQL based Notebooks using Azure Databricks
Built Orchestrated Workflow or Pipeline for the same using ADF Pipeline
Implemented Data Ingestion from source RDBMS Databases such as Postgres, Azure SQL, etc
Using Spark over JDBC on Azure Databricks
The solution is designed using Databricks Secrets and PySpark
Developed required Spark SQL Statements to create databases and tables as per medallion architecture using different providers or file formats such as Delta, Parquet, CSV, JSON, etc
Developed and Deployed Databricks Workflows/Jobs as per schedule using Databricks Notebooks built on PySpark and Spark SQL
Orchestrate Databricks based Tasks using Databricks Jobs
Used Git commit, Git push and Git add to store the application data and maintain code versions, and for collaborating with teammates
Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets
Experience using CICD processes for application software integration and deployment using Git.

Data Engineer

State Of Florida

01.2020 - 12.2020

Worked on Developing APIs for a backend application for a bulk order management system to extract and transform orders details
Created a Python-based application to convert data from one format to another, like CSV to Parquet
Worked with Python Collections, and used Pandas to read Json files, process those files, and analyze the data
Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
Load Data into non production environments using Database Import tools or utilities
Experience in building BI applications reports using Tableau and PowerBI
Used Git commit, Git push and Git add to store the application data and maintain code versions, and for collaborating with teammates
Unit Testing and Data Validations by running basic to complex SQL Queries
Automated Data Validations using Python as a programming language
Experience using CICD processes for application software integration and deployment using Git
Played the role of liaison between business users and development team and created technical specifications based on the business requirements
Analyze, develop, and build modern data solutions with the Azure PaaS service to enable data visualization
Understand the application current Production state and the impact of new installation on existing business processes
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
Pipelines were created in Azure Data Factory utilizing Linked Services/Datasets/Pipeline/ to extract, transform, and load data from many sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward
Developed Spark applications with Azure Data Factory and Spark-SQL for data extraction, transformation, and aggregation from different file formats to analyze and transform the data to uncover insights into customer usage patterns
Managed relational database service in which Azure SQL handles reliability, scaling, and maintenance.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
.Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.

Data Integration Developer

Standard Bank of South Africa

04.2016 - 07.2019

Create PySpark programs to process the data required by the Model Framework
Administration of all LINUX servers includes Linux tuning
Developing environment, Confidential S3, EC2, AWS Data Pipeline, Lambda, Redshift, RDS, and Dynamo DB integration Snowflake was used to perform data modelling and ETL
Develop dashboards and visualizations to help business users analyze data and provide insight to upper management using Power BI
Implemented advanced procedures like text analytics and processing using in-memory computing capabilities like Apache Spark written in Scala
Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, and S3
CI (Continuous Integration) and CD (Continuous Deployment) methodologies were applied using Jenkins
Created Map Reduce jobs to automate the transfer of data from HBase
Create and maintain a big data stack (Spark, Hadoop)
Translated requirements into SQL- based data modelling for all business processes
Worked on big data on AWS cloud services, i.e., EC2, S3, and EMR, AWS Lambda and AWS Code Pipeline enable continuous integration and deployment
Developed complex SQL queries, stored procedures and SSIS packages
To ingest data from various sources, an ETL pipeline was created using Spark and Hive
Wrote and executed various MySQL database queries from Python using Python- MySQL connector and MySQL DB package
Designed and built schema data models
Developed and implemented logging, metrics, and monitoring systems on AWS
Made reports and dashboards using Tableau
Business workflows were documented for stakeholder review.

Data Specialist

IBM

05.2011 - 04.2016

Responsible for creating detailed design and source to target mappings
Responsible to communicate with business users and project management to get business requirements and translate to ETL specifications
Created Extract Transform and Load (ETL) interfaces and gateways for backend database
Designed Mappings between sources to operational staging targets using Star Schema, implemented logic for Slowly Changing Dimensions (SCD)
Created shared containers to use in multiple jobs
Hands on experience upgrading DataStage from v11.5 to Information Server 11.7.1
REST API integration between DataStage and IGC catalog update, delete
Imported Custom asset in IGC
Metadex (Compact BI) to import Views & stored procedure of Oracle & MS SQL & DB2 DB to generate data lineage
Generated Data linage for application based on business requirement Generated the Data lineage, impact analysis, Business lineage reports between source database tables, views, DataStage jobs, reports etc.

Education

Master of Science - Software Engineering

International Technological University

San Jose, CA

12.2020

Master of Science - Mobile Computers

Staffordshire University

United Kingdom

06.2010

Skills

▪ Azure
▪ Big Data processing
▪ Hadoop Ecosystem
▪ Data Pipeline Design
▪ SQL and Databases
▪ Python Programming
▪ Scala programming
▪ Data Governance
▪ Data Security
▪ Data Analytics
▪ Java Spring Boot
▪ Kotlin Spring Boot
▪ REST API
▪ Kafka Streaming
▪ Analytical Thinking
▪ Attention to Detail
▪ Expert problem-solving

▪ Agile Methodologies
▪ Active Listening
▪ Ownership mindset
▪ Cloud Computing

▪ Amazon Web Services

▪ Databricks
▪ Snowflake
▪ Hadoop Ecosystem
▪ Relational Databases

▪ Data Integration

▪ ETL Development

▪ API Development

▪ Data Visualization

▪ Problem Solving

▪ Team Leadership

▪ Data Analysis

Certification

AWS Certified Solutions Architect

Timeline

Data Engineer

PayPal

03.2022 - 09.2024

Sr. Data Engineer

FIS

02.2021 - 03.2022

Data Engineer

State Of Florida

01.2020 - 12.2020

Data Integration Developer

Standard Bank of South Africa

04.2016 - 07.2019

Data Specialist

IBM

05.2011 - 04.2016

Master of Science - Software Engineering

International Technological University

Master of Science - Mobile Computers

Staffordshire University

Mohana

Summary

Overview

Work History

Data Engineer

Sr. Data Engineer

Data Engineer

Data Integration Developer

Data Specialist

Education

Master of Science - Software Engineering

Master of Science - Mobile Computers

Skills

Certification

Timeline

Data Engineer

Sr. Data Engineer

Data Engineer

Data Integration Developer

Data Specialist

Master of Science - Software Engineering

Master of Science - Mobile Computers

Similar Profiles

AMAL JASMINEAMAL JASMINE

CHINMAYA SRIVASTAVACHINMAYA SRIVASTAVA

Karthik NKarthik N

JOE CORDOVANAJOE CORDOVANA

Nick MacyNick Macy