Summary
Overview
Work History
Education
Skills
Projects
Contactno
Certification
Timeline
Generic
Sharmila Thatha

Sharmila Thatha

Sandy,UT

Summary

Having 8+ years of experience in the IT industry as an ETL Developer and Data Engineer. Excellent understanding of all different cloud environments IaaS, PaaS, and SaaS. Experience in complete software development life cycle (SDLC) with software development models like Waterfall and Agile. Good experience in IBM DataStage, ADF, Data Lake, Databricks, Functions, Logic Apps, Azure SQL DB, Azure DevOps, Jenkins, Git, GitHub, Oracle, Teradata, Hadoop and Snowflake. Good Domain Knowledge in Banking, Insurance, Manufacturing and Retail. Strong in Data Warehousing concepts like fact table, dimension table, Star and Snowflake schema. Experience in scheduling sequence and parallel jobs using DataStage director, Unix Scripts and Scheduling tools like Control M. Extract, Transform and Load data from Source systems to Azure Data storage service using Azure data factory and Spark SQL. Data ingestion to one or more Azure services like Data Lake, Storage, SQL, DW and processing the data in Azure Databricks. Profusely worked on version control tools Git, GitHub, and Azure repos to track changes made by different people in source code. Involved in Branching, Merging and Tags. Developed PySpark, Python for regular expression project in Hadoop/Hive Environment on Linux/Windows for big data resources. Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Data Engineer

Zions Bancorporation
9 2023 - Current
  • Extracted data from heterogeneous sources such as Oracle, Teradata, Greenplum, and Postgres and loaded it into a different database
  • Involved in creating Hive tables, loading data, and writing Hive queries
  • Worked on Branching, Tagging, Release Activities on Version control Tools like Git and Azure Repos
  • Deployed and designed pipelines through Azure data factory and debugging the process for errors
  • Detailed and problem-solving oriented in DataStage jobs and addressing production issues such as performance tuning and enhancements
  • Passing Parameters to Control-M job, creating job dependencies and alerts as per application team and business requirement
  • Developed and implemented software release management strategies for various applications in an agile environment.

Data Engineer

Staples Inc
07.2022 - 02.2023
  • Extensively used the Azure services like ADF and Logic App for ETL to push in or out the data from Database to Blob storage and worked on different data formats CSV, JSON, and Parquet
  • Provisioned Hadoop and Spark Clusters to build the On-Demand Data warehouse and provided the data-to-Data Scientists
  • Processed HDFS data and created external tables using Hive, to analyze visitors per day, page views and most purchased products
  • Worked on creating dependencies of activities in ADF and creating stored procedures and scheduled them in Azure Environment and Experience in writing Python Scripts to automate the deployments
  • Used ADF as Orchestration tool for integration data from upstream to downstream systems
  • Used Azure DevOps services such as Azure Repos, Azure Boards, and Azure Test Plans to plan work and collaborate on code development, build and deployed applications
  • Experience in Jenkins to schedule a job as per the requirement, report monitoring and notification functionality to report success or failure.

Data Engineer

Standard Chartered Bank
01.2022 - 06.2022
  • Designed and Developed user-defined functions, Stored procedures, Triggers for Hadoop Database
  • Developed Azure function apps as API services to communicate Database
  • Involved in build & Deployment of function apps from Visual Studio
  • Created low-level design documents based High-level design documents and delivered clear, well-communicated, and completed design documents
  • Worked on configuring Git branching Strategy to support the software development cycle to include processes, tools, and automation efforts
  • Integrated Active directory authentication to every database request
  • Performed Unit Testing, system Integration testing and User acceptance testing
  • Used Python and PySpark SQL to convert Hive/SQL native queries into Spark DF transformations in Apache Spark
  • Created, provisioned different Data Bricks cluster needed for batch and continuous streaming data processing and installed the required libraries for clusters.

Data Engineer

Caterpillar Inc.
06.2020 - 12.2021
  • Played key role in migrating Teradata objects into Snowflake environment
  • I did reverse engineering and understood existing DataStage jobs and created solution documents
  • Created low-level design document based on high-level document
  • Based on the mapping sheet created DataStage jobs
  • Created ADF pipelines based on solution documents and performed unit testing and extensive experience in diagnosing and resolving the UAT and Production issues
  • Building the pipelines to copy the data from source to destination in ADF and taking backups of pipeline codes and scheduling the pipelines
  • Created Logic Apps with different triggers, connectors for integrating the data from workday to different destinations
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way
  • I worked on creating dependencies of activities in ADF and creating stored procedures and scheduled them in Azure Environment.

ETL Developer

HSBC
05.2016 - 05.2020
  • Involved in Interacting with client to understand the existing system, requirement, and business logic
  • Worked on Teradata utilities like Fast Load, Multi Load, TPump and BTEQ scripts
  • Successfully implemented pipelines and partitioning techniques and ensured load balancing of data
  • Worked on SCDs to populate Type I and Type II slowly changing dimensions tables from several operational source files
  • Generated surrogate keys by using surrogate stage and Transformer stage
  • Created before and after routines and subroutines and Transform functions used across the project
  • Involved in creating table definitions, indexes, sequences, views, materialized views
  • Involved in ongoing production support and process improvements
  • Ran the DataStage jobs through third party scheduler Control M
  • Experience in Developing Spark applications using Spark SQL, PySpark and Delta Lake in Databricks for data extraction, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns
  • Implemented Azure logic Apps, Azure Functions, Azure Storage, and Service Bus Queues for large enterprise-level ERP Integration systems.

ETL Developer/Data Engineer

Integra
06.2014 - 04.2016
  • Extensive experience in extraction, transformation and loading of data directly from different heterogeneous source systems such as Flat files, Data Bases to different target systems
  • Used most of the stages in data stage like Sequential file, Dataset, Filter, Change Capture, Copy, Remove duplicates, Sort, Aggregates, Lookup, Join, Funnel, Transformer etc
  • Extensively worked on job sequences to control the execution of the job flow using various activities and triggers
  • Created local and shared containers to facilitate ease and reuse of jobs
  • Worked on sequence job like job activity, wait for file, email notification, sequencer, exception handler activity and execution command
  • Used Data Stage Director and Control M tool to run and monitor the Data Stage jobs
  • Involved in Performance tuning to improve the performance of the DataStage jobs by Created Reusable Transformations using Shared Containers.

Education

Skills

ETL Tools: IBM Infosphere DataStage, Azure Data Factory

Operating System: Windows, Unix, Linux (Ubuntu)

Databases: Oracle, SQL Server, Teradata, Snowflake, Hadoop and Hive

Monitoring Tools: Control M

Version Control Tools: Git, GitHub, Azure repos

Azure Data Factory

Azure DevOps

Projects

Credit Lead, Zions Bancorporation, Midvale, UT, Data Engineer, DataStage, Hive, Linux, ServiceNow, ADF, Azure DevOps, Git, Control M, 09/2023 - Present, Zions Bancorporation is a national bank headquartered in Salt Lake City, Utah. It operates as a national bank rather than as a bank holding company and does business under the following seven brands: Zions Bank, Amegy Bank of Texas, California Bank and Trust, National Bank of Arizona, Nevada State Bank, Vectra Bank Colorado, and the Commerce Bank of Washington., Extracted data from heterogeneous sources such as Oracle, Teradata, Greenplum, and Postgres and loaded it into a different database., Involved in creating Hive tables, loading data, and writing Hive queries., Worked on Branching, Tagging, Release Activities on Version control Tools like Git and Azure Repos., Deployed and designed pipelines through Azure data factory and debugging the process for errors., Detailed and problem-solving oriented in DataStage jobs and addressing production issues such as performance tuning and enhancements., Passing Parameters to Control-M job, creating job dependencies and alerts as per application team and business requirement., Developed and implemented software release management strategies for various applications in an agile environment. MIDAS Optimization, Staples Inc, Massachusetts, USA, Data Engineer, DataStage, Databricks, Python, PySpark, GitHub, Jenkins, Hadoop, Azure DevOps, 07/2022 - 02/2023, Staples Inc. is an American retail company. Migrating On-prem data to Azure cloud in Agile Methodology using Azure DevOps. Recreating the existing application logic and functionality in the ADF, ADL, Database and Datawarehouse., Extensively used the Azure services like ADF and Logic App for ETL to push in or out the data from Database to Blob storage and worked on different data formats CSV, JSON, and Parquet., Provisioned Hadoop and Spark Clusters to build the On-Demand Data warehouse and provided the data-to-Data Scientists. Processed HDFS data and created external tables using Hive, to analyze visitors per day, page views and most purchased products., Worked on creating dependencies of activities in ADF and creating stored procedures and scheduled them in Azure Environment and Experience in writing Python Scripts to automate the deployments. Used ADF as Orchestration tool for integration data from upstream to downstream systems., Used Azure DevOps services such as Azure Repos, Azure Boards, and Azure Test Plans to plan work and collaborate on code development, build and deployed applications. Experience in Jenkins to schedule a job as per the requirement, report monitoring and notification functionality to report success or failure. SCB FMETAL, Standard Chartered Bank, New York, USA, Data Engineer, DataStage, ADF, Databricks, Python, PySpark, GitHub, Jenkins, Hadoop, JIRA, Azure DevOps, 01/2022 - 06/2022, Funds Transfer Pricing provides financial institutions with the most effective decision-making platform for customer, organizational, product profitability, product pricing and balance sheet and resource allocations. Funds transfer pricing is an internal measurement and allocation process that assigns a profit contribution value to funds gathered and lent or invested by the bank., Designed and Developed user-defined functions, Stored procedures, Triggers for Hadoop Database., Developed Azure function apps as API services to communicate Database. Involved in build & Deployment of function apps from Visual Studio., Created low-level design documents based High-level design documents and delivered clear, well-communicated, and completed design documents., Worked on configuring Git branching Strategy to support the software development cycle to include processes, tools, and automation efforts., Integrated Active directory authentication to every database request. Performed Unit Testing, system Integration testing and User acceptance testing., Used Python and PySpark SQL to convert Hive/SQL native queries into Spark DF transformations in Apache Spark. Created, provisioned different Data Bricks cluster needed for batch and continuous streaming data processing and installed the required libraries for clusters. LEAD Program, Caterpillar Inc., Deerfield, Illinois, U.S., Data Engineer, DataStage, ADF, Teradata, Snowflake, GIT, Python, PySpark, Azure DevOps, 06/2020 - 12/2021, Caterpillar is the world's largest construction equipment manufacturer. We used Azure DevOps boards in Agile Scrum environment. We have been involved as ETL team from migrating Teradata objects into Snowflake (SaaS) environment by using DataStage and ADF., Played key role in migrating Teradata objects into Snowflake environment. I did reverse engineering and understood existing DataStage jobs and created solution documents., Created low-level design document based on high-level document. Based on the mapping sheet created DataStage jobs., Created ADF pipelines based on solution documents and performed unit testing and extensive experience in diagnosing and resolving the UAT and Production issues., Building the pipelines to copy the data from source to destination in ADF and taking backups of pipeline codes and scheduling the pipelines. Created Logic Apps with different triggers, connectors for integrating the data from workday to different destinations., Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way. I worked on creating dependencies of activities in ADF and creating stored procedures and scheduled them in Azure Environment. Phoenix HUB Clustering, HSBC, ETL Developer, DataStage, ADF, ADLS, Databricks, Oracle, Teradata, Linux, Control M, ServiceNow, JIRA, 05/2016 - 05/2020, HUB Clustering is one of the initiatives that has been launched under a program called Phoenix. The instance of Hub running across 37 countries will be merged into 7 Cluster, using the Multi country HUB (MCH) feature. My project is modernizing the existing system, migrating on-premises data into Azure cloud, and replacing DataStage ETL jobs with Azure data factory., Involved in Interacting with client to understand the existing system, requirement, and business logic., Worked on Teradata utilities like Fast Load, Multi Load, TPump and BTEQ scripts., Successfully implemented pipelines and partitioning techniques and ensured load balancing of data., Worked on SCDs to populate Type I and Type II slowly changing dimensions tables from several operational source files. Generated surrogate keys by using surrogate stage and Transformer stage., Created before and after routines and subroutines and Transform functions used across the project. Involved in creating table definitions, indexes, sequences, views, materialized views., Involved in ongoing production support and process improvements. Ran the DataStage jobs through third party scheduler Control M., Experience in Developing Spark applications using Spark SQL, PySpark and Delta Lake in Databricks for data extraction, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns., Implemented Azure logic Apps, Azure Functions, Azure Storage, and Service Bus Queues for large enterprise-level ERP Integration systems. Integra EDW Rebuild, Integra, Plainsboro Township, NJ, ETL Developer\Data Engineer, DataStage, Oracle, SQL server, Unix, Control M, ServiceNow, 06/2014 - 04/2016, Integra Life Sciences manages sales through Oracle ERP system, all orders and invoices will be generated through Oracle ERP. Whenever any new lead, order or invoice is created data will be pushed into Oracle database. We have been involved as an ETL team to pull data from Oracle database and to push into SQL database using DataStage., Extensive experience in extraction, transformation and loading of data directly from different heterogeneous source systems such as Flat files, Data Bases to different target systems., Used most of the stages in data stage like Sequential file, Dataset, Filter, Change Capture, Copy, Remove duplicates, Sort, Aggregates, Lookup, Join, Funnel, Transformer etc. Extensively worked on job sequences to control the execution of the job flow using various activities and triggers., Created local and shared containers to facilitate ease and reuse of jobs. Worked on sequence job like job activity, wait for file, email notification, sequencer, exception handler activity and execution command. Used Data Stage Director and Control M tool to run and monitor the Data Stage jobs., Involved in Performance tuning to improve the performance of the DataStage jobs by Created Reusable Transformations using Shared Containers.

Contactno

9802909476

Certification

Azure Administrator Associate

Timeline

Azure Administrator Associate

03-2025

Data Engineer

Staples Inc
07.2022 - 02.2023

Data Engineer

Standard Chartered Bank
01.2022 - 06.2022

Data Engineer

Caterpillar Inc.
06.2020 - 12.2021

ETL Developer

HSBC
05.2016 - 05.2020

ETL Developer/Data Engineer

Integra
06.2014 - 04.2016

Data Engineer

Zions Bancorporation
9 2023 - Current

Sharmila Thatha