Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Mohana

Miamisburg,OH

Summary

Dynamic Lead Data Engineer with 12 Years track record of spearheading diverse teams to design and implement innovative data solutions. Leveraged expertise in Scala/Python, Apache Spark, and cloud technologies to enhance data integration and analytics. Renowned for analytical thinking and an ownership mindset, consistently delivering projects that meet rigorous data governance standards.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Data Engineer

PayPal
03.2022 - 09.2024
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team
  • Design and architect various layers of Data Lake
  • Design star schema in Big Query
  • Loading salesforce Data every 15 min on incremental basis to Big Query raw and UDM layer using SOQL, Google Dataproc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil and Shell Script
  • Using rest API with Python to ingest Data from and some other site to Big Query
  • Build a program with Python and Apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Big Query tables
  • Building a Scala and spark based configurable framework to connect common Data sources like MySQL, Oracle, Postgres, SQL Server, Salesforce, Big Query and load it in Big Query
  • Monitoring Big Query, Dataproc and cloud Data flow jobs via Stack driver for all the environments
  • Open SSH tunnel to Google Dataproc to access to yarn manager to monitor spark jobs
  • Submit spark jobs using gsutil and spark submission get it executed in Dataproc cluster
  • Wrote a Python program to maintain raw file archival in GCS bucket
  • Worked on setting up secure file transfers between Linux servers and external systems using SFTP protocols and used LFTP and WinSCP to facilitate these transfers and ensure reliable and efficient data exchange
  • Using g-cloud function with Python to load Data into Big Query for on arrival csv files in GCS bucket
  • Process and load bound and unbound Data from Google pub/subtopic to Big Query using cloud Dataflow with Python
  • Create firewall rules to access Google Dataproc from other machines
  • Analyze various types of raw file like Json, Cs, Xml with Python using Pandas, NumPy etc
  • Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
  • Unit Testing and Data Validations by running basic to complex SQL Queries
  • Automated Data Validations using Python as a programming language
  • Performance Tuning of long running Spark SQL Queries using techniques such as Partitioning
  • Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets
  • Used Git for version control with colleagues
  • Actively worked with business to translate requirements to technical specifications and coordinate with offshore teams
  • ETL pipelines were used for data analytics and processing.

Sr. Data Engineer

FIS
02.2021 - 03.2022
  • Developed and maintained end-to-end ETL or Data engineering pipelines to process large scale data using Azure as Cloud platform
  • Developed Data Engineering Pipelines utilizing Python, Spark, Databricks, Airflow, and other technologies
  • Researched and implemented various components like pipeline, activity, mapping data flows, data sets, linked services, triggers and control flow
  • Built ETL pipelines using Azure Data factory with Azure SQL as source and Azure Synapse Analytics / Snowflake as Data Warehouse
  • Worked on building ETL pipelines using Databricks with Azure Data Lake storage (Gen 2) as source and Azure Cosmos DB as Destination
  • Implemented and maintained Neo4j databases to manage and query highly connected data structures efficiently
  • Ingested Data into SQL pools from various Sources like ADLS, Azure SQL database and On-Premises sources
  • Performed transformations such as Aggregating, Filtering and Joining datasets using SQL pools to manipulate the data
  • Created ADF pipelines which will run Dynamically using parameters and used Filter and Aggregate Transformations to transform the data as per the requirement
  • Extensively worked on Data Analysis using SQL based queries in Azure SQL, Azure Databricks and Azure Synapse Analytics
  • Worked on Building ADF data flow using Azure SQL and Azure Synapse analytics
  • Orchestrate complex data pipelines using ADF pipelines
  • The pipeline contains activities based on Azure Data Copy, ADF Dataflow (with Azure SQL as source and Azure Synapse Analytics as target), for Each for Baseline etc
  • Performance Tuning of ADF dataflows and Pipelines using Custom integration Runtimes and reducing the shuffle partitions
  • Designed and Implemented end to end Data Engineering Pipelines for the ERP Solutions and Retail Solutions using ADF Data Flow and Spark on Azure Databricks
  • Handle Ingestion of data from various data sources into Azure Storage using ADF Data Flows
  • Created a PySpark-based application to convert data from one format to another, like CSV to Parquet
  • Designed and developed applications using PySpark to read the CSV files and dynamically created the tables in Azure data storage
  • Implemented ETL Logic using PySpark and Spark SQL based Notebooks using Azure Databricks
  • Built Orchestrated Workflow or Pipeline for the same using ADF Pipeline
  • Implemented Data Ingestion from source RDBMS Databases such as Postgres, Azure SQL, etc
  • Using Spark over JDBC on Azure Databricks
  • The solution is designed using Databricks Secrets and PySpark
  • Developed required Spark SQL Statements to create databases and tables as per medallion architecture using different providers or file formats such as Delta, Parquet, CSV, JSON, etc
  • Developed and Deployed Databricks Workflows/Jobs as per schedule using Databricks Notebooks built on PySpark and Spark SQL
  • Orchestrate Databricks based Tasks using Databricks Jobs
  • Used Git commit, Git push and Git add to store the application data and maintain code versions, and for collaborating with teammates
  • Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets
  • Experience using CICD processes for application software integration and deployment using Git.

Data Engineer

State Of Florida
01.2020 - 12.2020
  • Worked on Developing APIs for a backend application for a bulk order management system to extract and transform orders details
  • Created a Python-based application to convert data from one format to another, like CSV to Parquet
  • Worked with Python Collections, and used Pandas to read Json files, process those files, and analyze the data
  • Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
  • Load Data into non production environments using Database Import tools or utilities
  • Experience in building BI applications reports using Tableau and PowerBI
  • Used Git commit, Git push and Git add to store the application data and maintain code versions, and for collaborating with teammates
  • Unit Testing and Data Validations by running basic to complex SQL Queries
  • Automated Data Validations using Python as a programming language
  • Experience using CICD processes for application software integration and deployment using Git
  • Played the role of liaison between business users and development team and created technical specifications based on the business requirements
  • Analyze, develop, and build modern data solutions with the Azure PaaS service to enable data visualization
  • Understand the application current Production state and the impact of new installation on existing business processes
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
  • Pipelines were created in Azure Data Factory utilizing Linked Services/Datasets/Pipeline/ to extract, transform, and load data from many sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward
  • Developed Spark applications with Azure Data Factory and Spark-SQL for data extraction, transformation, and aggregation from different file formats to analyze and transform the data to uncover insights into customer usage patterns
  • Managed relational database service in which Azure SQL handles reliability, scaling, and maintenance.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.

Data Integration Developer

Standard Bank of South Africa
04.2016 - 07.2019
  • Create PySpark programs to process the data required by the Model Framework
  • Administration of all LINUX servers includes Linux tuning
  • Developing environment, Confidential S3, EC2, AWS Data Pipeline, Lambda, Redshift, RDS, and Dynamo DB integration Snowflake was used to perform data modelling and ETL
  • Develop dashboards and visualizations to help business users analyze data and provide insight to upper management using Power BI
  • Implemented advanced procedures like text analytics and processing using in-memory computing capabilities like Apache Spark written in Scala
  • Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, and S3
  • CI (Continuous Integration) and CD (Continuous Deployment) methodologies were applied using Jenkins
  • Created Map Reduce jobs to automate the transfer of data from HBase
  • Create and maintain a big data stack (Spark, Hadoop)
  • Translated requirements into SQL- based data modelling for all business processes
  • Worked on big data on AWS cloud services, i.e., EC2, S3, and EMR, AWS Lambda and AWS Code Pipeline enable continuous integration and deployment
  • Developed complex SQL queries, stored procedures and SSIS packages
  • To ingest data from various sources, an ETL pipeline was created using Spark and Hive
  • Wrote and executed various MySQL database queries from Python using Python- MySQL connector and MySQL DB package
  • Designed and built schema data models
  • Developed and implemented logging, metrics, and monitoring systems on AWS
  • Made reports and dashboards using Tableau
  • Business workflows were documented for stakeholder review.

Data Specialist

IBM
05.2011 - 04.2016
  • Responsible for creating detailed design and source to target mappings
  • Responsible to communicate with business users and project management to get business requirements and translate to ETL specifications
  • Created Extract Transform and Load (ETL) interfaces and gateways for backend database
  • Designed Mappings between sources to operational staging targets using Star Schema, implemented logic for Slowly Changing Dimensions (SCD)
  • Created shared containers to use in multiple jobs
  • Hands on experience upgrading DataStage from v11.5 to Information Server 11.7.1
  • REST API integration between DataStage and IGC catalog update, delete
  • Imported Custom asset in IGC
  • Metadex (Compact BI) to import Views & stored procedure of Oracle & MS SQL & DB2 DB to generate data lineage
  • Generated Data linage for application based on business requirement Generated the Data lineage, impact analysis, Business lineage reports between source database tables, views, DataStage jobs, reports etc.

Education

Master of Science - Software Engineering

International Technological University
San Jose, CA
12.2020

Master of Science - Mobile Computers

Staffordshire University
United Kingdom
06.2010

Skills

▪ Azure
▪ Big Data processing
▪ Hadoop Ecosystem
▪ Data Pipeline Design
▪ SQL and Databases
▪ Python Programming
▪ Scala programming
▪ Data Governance
▪ Data Security
▪ Data Analytics
▪ Java Spring Boot
▪ Kotlin Spring Boot
▪ REST API
▪ Kafka Streaming
▪ Analytical Thinking
▪ Attention to Detail
▪ Expert problem-solving

▪ Agile Methodologies
▪ Active Listening
▪ Ownership mindset
▪ Cloud Computing

▪ Amazon Web Services

▪ Databricks
▪ Snowflake
▪ Hadoop Ecosystem
▪ Relational Databases

▪ Data Integration

▪ ETL Development

▪ API Development

▪ Data Visualization

▪ Problem Solving

▪ Team Leadership

▪ Data Analysis

Certification

AWS Certified Solutions Architect

Timeline

Data Engineer

PayPal
03.2022 - 09.2024

Sr. Data Engineer

FIS
02.2021 - 03.2022

Data Engineer

State Of Florida
01.2020 - 12.2020

Data Integration Developer

Standard Bank of South Africa
04.2016 - 07.2019

Data Specialist

IBM
05.2011 - 04.2016

Master of Science - Software Engineering

International Technological University

Master of Science - Mobile Computers

Staffordshire University
Mohana