Summary
Overview
Work History
Education
Skills
Websites
Projects
Timeline
Generic

Anurag Tamia

Hyderabad

Summary

Accomplished Data Engineer with extensive experience at State Street, specializing in PySpark and SQL. Proven track record in optimizing data pipelines and enhancing data quality, ensuring compliance and governance. Adept at collaborating with cross-functional teams to deliver actionable insights, driving operational efficiency and client satisfaction.

Overview

10
10
years of professional experience

Work History

Data Engineer

State Street
Ohio, USA
04.2025 - Current
  • Analyze client data requirements across benchmark data providers using tools like SQL and Talend for efficient data extraction and transformation.
  • Conducted data coverage analysis for new and existing clients to assess suitability of data services, leveraging Snowflake for advanced querying and data warehousing.
  • Supported migration of clients from legacy benchmark services to next-generation solutions, utilizing Databricks for scalable data processing and transformation.
  • Implement centralized governance and access control for sensitive datasets using Databricks Unity Catalog, ensuring compliance with organizational and regulatory standards.
  • Automate data quality checks and governance policies within Unity Catalog to ensure consistent and reliable client data delivery.
  • Reconcile and support benchmark indices across multiple systems to ensure data consistency, utilizing Hadoop and PySpark for large-scale data validation and reconciliation tasks.
  • Validated solutions and alerts to monitor vendor raw data, ensuring accuracy and identifying issues through automated SQL checks and Talend data quality tools.
  • Ensure data integrity across multiple environments by effectively QA’ing key data points using Databricks notebooks and Snowflake analytics.
  • Work as QA to locate, troubleshoot, and resolve data gaps/issues when new data vendors integrate with the system, utilizing PySpark for efficient data profiling and validation.
  • Serve as the primary link between data vendors and clients, ensuring clear communication and issue resolution using standardized procedures.
  • Validate SQL automations to improve data coverage and enhance operational efficiency across Snowflake and Hadoop-based ecosystems.
  • Contribute to the enhancement of data service benchmarks and the evolution of data offerings by leveraging insights from big data platforms such as Databricks and Hadoop.
  • Support future enhancements of Data Service Benchmarks by identifying opportunities for improvement through data quality analysis and business rules automation.
  • State Street is a financial services and bank holding company that provides investment management, servicing, and administration.

Data Engineer/Azure Databricks Developer

Deloitte USI
Hyderabad, India
08.2019 - 01.2025
  • Implemented data governance with Unity Catalog in Azure Databricks to manage access and ensure compliance.
  • Developed data processing workflows in Databricks using Apache Spark/PySpark to clean, transform, and aggregate the raw telemetry data.
  • Utilized Unity Catalog to enforce governance policies and automate data quality validations, ensuring accuracy and consistency in client data delivery.
  • Integrated Azure Data Factory with Databricks notebooks for complex data transformations, taking advantage of Databricks' scalability and Apache Spark's parallel processing capabilities to enhance data processing efficiency.
  • Utilized Hadoop ecosystem tools (HDFS, MapReduce) for distributed data storage and processing of large-scale batch workloads, especially in legacy and hybrid environments.
  • Designed and optimized complex SQL queries and data models in Databricks SQL to support various analytics use cases.
  • Utilized Azure Data Factory to orchestrate and automate ETL pipelines, moving data across different environments and integrating with Azure services like Blob Storage, SQL Data Warehouse, and other cloud data sources.
  • Wrote multiple stored procedures in SQL to automate data loads and implement Data Quality framework.
  • Utilized Apache Airflow for orchestrating and scheduling critical ETL workflows, enabling dependency handling, retry mechanisms, and performance monitoring for large-scale data pipelines.
  • Built CI/CD pipelines using Azure DevOps, integrated with Git to automate version control, testing, deployment, and promote DevOps best practices.
  • Integrated Git (GitHub) into development workflows to manage version control, collaborate on codebase changes, handle branching strategies, and automate pull request-based deployments.
  • Deployed and maintained machine learning models in production using Kubernetes environments.
  • Basic working experience with Microsoft Fabric, including using Fabric Data Factory pipelines for data ingestion, working with Lakehouse for centralized data storage, orchestrating data workflows, monitoring pipeline execution, and integrating Fabric with Azure services to support reliable end-to-end data processing and analytics.
  • Ensured best practices in CI/CD by setting up Jenkins pipelines for model training and deployment.
  • Designed and implemented scalable Snowflake data warehouse solutions to support business intelligence and analytics.
  • Designed and implemented end-to-end data pipelines using Informatica PowerCenter, Azure Data Factory, enabling seamless integration of data from on-premise and cloud sources.
  • Conducted performance tuning and troubleshooting of data processing jobs in Databricks and Snowflake, enhancing efficiency and scalability.
  • Implemented Snowflake security best practices including role-based access control (RBAC) and data masking.
  • Integrate data from a variety of sources, including Azure Data Lake, SQL databases, REST APIs, and third-party systems, ensuring data availability for downstream applications.
  • Design and optimize Spark jobs for batch and streaming data processing, improving execution times and resource consumption.
  • Implemented CI/CD pipeline in Azure Databricks using Azure DevOps, integrating Git for version control, automated testing, and deployment.
  • Leverage Databricks notebooks for interactive data exploration, visualization, and collaborative analysis.
  • Leverage Azure services such as Azure Storage, Azure Data Lake, Azure SQL Database, and Azure Blob Storage to store and process data efficiently.
  • Conduct performance tuning of Spark jobs by optimizing code, partitioning data, caching, and configuring cluster parameters.
  • Implement data partitioning strategies, manage resource allocation, and optimize Spark SQL queries to improve runtime performance and resource utilization.
  • Worked with Azure Cosmos DB to store and manage data. Set up databases, performed basic queries, and ensured data was available and reliable.
  • Developed interactive dashboards and reports using Tableau, integrating data from multiple sources, including Snowflake, to provide actionable business insights.
  • Build and maintain data pipelines for data integration and processing. Data is collected from various sources, transformed into a practical format, and loaded into the appropriate data storage solution on Azure using tools such as Azure Data Factory and Azure Databricks.
  • Resolved bottlenecks in data pipeline and optimized data processing algorithms to improve overall performance.
  • Deloitte USI is a member firm of Deloitte Touche Tohmatsu Limited, providing audit, consulting, financial advisory, risk management, and tax services.

Data Engineer

Invesco
Hyderabad, India
01.2019 - 07.2019
  • Designed and implemented end-to-end ETL pipelines in Azure Data Factory, automating data ingestion, transformation, and loading processes across cloud and on-premises environments.
  • Extensive experience using Snowflake, Informatica PowerCenter for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.).
  • Designed and implemented scalable Snowflake data warehouse solutions to support business intelligence and analytics.
  • Created reusable Snowflake data models, streamlining reporting and analytics capabilities for end users.
  • Optimized complex SQL queries to extract, transform, and load (ETL) data from multiple sources into Data Warehouse environment, improving data accessibility for analysis.
  • Applied indexing and query optimization techniques to improve the performance of SQL queries, reducing run times for large-scale data retrievals.
  • Worked extensively with stored procedures and views to encapsulate complex business logic and improve code maintainability and reuse.
  • Involved in the development of ETL pipelines in Informatica PowerCenter to load data from operational databases and external systems into the Data Warehouse, adhering to dimensional modeling techniques (e.g., Star Schema, Snowflake Schema).
  • Developed Data Warehouse tables and ETL mappings to load data in DW tables as a part of Business Requirements.
  • Designed, developed, and managed Autosys job flows to automate complex ETL processes, ensuring reliable scheduling and execution of data pipeline.
  • Maintained Autosys job dependencies, establishing job hierarchies and defining conditions to ensure reliable execution of data pipelines.
  • Provided technical support to Data Analysts and Business Intelligence (BI) teams, helping them retrieve and analyze data efficiently through optimized SQL queries and reports.
  • Developed interactive dashboards and reports using Tableau, integrating data from multiple sources to provide actionable business insights.
  • Developed interactive dashboards and reports using Power BI, integrating data from multiple sources to provide actionable business insights.
  • Invesco is an independent investment management firm dedicated to delivering an investment experience that helps people get more out of life.

Data Engineer

Wipro
Hyderabad, India
01.2018 - 12.2018
  • Extensive experience using SSIS for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.).
  • Proficient in designing and deploying complex data workflows for large-scale data transformations, error handling, and logging to ensure smooth data migration.
  • Expertise in developing and deploying dynamic reports using SSRS/Tableau for business intelligence and data analysis, with a focus on interactive, drill-down, and parameterized reports.
  • Optimized SSRS/Tableau reports for performance by creating efficient queries, reducing resource consumption, and implementing caching and pagination to enhance user experience.
  • Proficient in SQL for querying, manipulating, and managing data in relational databases like SQL Server, MySQL, and Oracle.
  • Expertise in SQL performance tuning, including query optimization, index analysis, and execution plan interpretation.
  • Expertise in creating views, indexes, constraints, and managing transactions to ensure data integrity, performance, and security.
  • Wipro is a leading global information technology, consulting, and business process services company.

Data Engineer

Wipro
Hyderabad, India
02.2016 - 12.2017
  • Extensive experience using Informatica Powercenter for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.).
  • Developed PL/SQL procedures to automate data processing tasks and enhance data validation.
  • Created PL/SQL functions to encapsulate reusable logic, improving code maintainability and reusability.
  • Applied JOINs, subqueries, and aggregate functions to handle complex data relationships and requirements.
  • Wrote SQL queries to generate reports, extract data for analysis, and support decision-making.
  • Generated a report on vessel services using SQL/PLSQL queries and visualized it with Python.
  • Wipro is a leading global information technology, consulting, and business process services company.

Education

Bachelor of Technology - Electronic and Communication Engineering

Indian Institute of Information Technology Design and Manufacturing
Jabalpur, India
01-2015

Skills

  • PySpark and T-SQL
  • PL/SQL and R
  • Azure Databricks and Cosmos DB
  • Hadoop and Snowflake
  • SQL Server and Oracle 11g
  • Data warehousing
  • Apache Airflow and Kubernetes
  • CI/CD pipelines
  • Informatica PowerCenter and SSIS
  • Tableau and SSRS
  • Data preparation

Projects

  • Benchmark Service Application, State Street, Columbus, Ohio, Finance, 04/01/25, Present, Data Engineer, SQL, Azure Databricks, PySpark, Talend, Snowflake, ETL, Analyze client data requirements across benchmark data providers using tools like SQL and Talend for efficient data extraction and transformation., Conduct data coverage analysis for new and existing clients to assess the suitability of data services for their needs, leveraging Snowflake for data warehousing and advanced querying., Support the migration of existing clients from legacy benchmark services to next-generation services using Databricks for scalable data processing and transformation., Implement centralized governance and access control for sensitive datasets using Databricks Unity Catalog, ensuring compliance with organizational and regulatory standards., Automate data quality checks and governance policies within Unity Catalog to ensure consistent and reliable client data delivery., Reconcile and support benchmark indices across multiple systems to ensure data consistency, utilizing Hadoop and PySpark for large-scale data validation and reconciliation tasks., Validate solutions and alerts to monitor raw data from vendors, ensuring accuracy and identifying issues through automated SQL checks and Talend data quality tools., Ensure data integrity across multiple environments by effectively QA’ing key data points using Databricks notebooks and Snowflake analytics., Work as QA to locate, troubleshoot, and resolve data gaps/issues when new data vendors integrate with the system, utilizing PySpark for efficient data profiling and validation., Act as the first line of support for client trouble reporting and escalate issues as needed while maintaining detailed logs using structured SQL-based audit trails., Document client requests, issues, and resolutions to maintain comprehensive records using collaborative platforms and integrated tracking tools., Serve as the primary link between data vendors and clients, ensuring clear communication and issue resolution using standardized procedures., Validate SQL automations to improve data coverage and enhance operational efficiency across Snowflake and Hadoop-based ecosystems., Contribute to the enhancement of data service benchmarks and the evolution of data offerings by leveraging insights from big data platforms such as Databricks and Hadoop., Support future enhancements of Data Service Benchmarks by identifying opportunities for improvement through data quality analysis and business rules automation., Contribute to the continuous evolution of data services, ensuring alignment with industry standards and client needs by incorporating feedback into the ETL pipeline lifecycle.
  • Audit Application, Deloitte USI, Hyderabad, Telangana, Audit and Assurance Services, 08/01/19, 01/31/25, Data Engineer/Azure Databricks Developer, PySpark, Python, Tableau, SQL, Azure Databricks, Hadoop, Azure Cosmos, Azure Data Factory, Airflow, Snowflake, GIT, Trifacta, Excel, Implemented data governance with Unity Catalog in Azure Databricks to manage access and ensure compliance., Developed data processing workflows in Databricks using Apache Spark/PySpark to clean, transform, and aggregate the raw telemetry data., Utilize Unity Catalog to enforce governance policies and automate data quality validations, ensuring accuracy and consistency in client data delivery., Integrated Azure Data Factory with Databricks notebooks for complex data transformations, taking advantage of Databricks' scalability and Apache Spark's parallel processing capabilities to enhance data processing efficiency., Utilized Hadoop ecosystem tools (HDFS, MapReduce) for distributed data storage and processing of large-scale batch workloads, especially in legacy and hybrid environments., Designed and optimized complex SQL queries and data models in Databricks SQL to support various analytics use cases., Utilized Azure Data Factory to orchestrate and automate ETL pipelines, moving data across different environments and integrating with Azure services like Blob Storage, SQL Data Warehouse, and other cloud data sources., Wrote multiple stored procedures in SQL to automate data loads and implement Data Quality framework., Utilized Apache Airflow for orchestrating and scheduling critical ETL workflows, enabling dependency handling, retry mechanisms, and performance monitoring for large-scale data pipelines., Built CI/CD pipelines using Azure DevOps, integrated with Git to automate version control, testing, deployment, and promote DevOps best practices., Integrated Git (GitHub) into development workflows to manage version control, collaborate on codebase changes, handle branching strategies, and automate pull request-based deployments., Deployed and maintained machine learning models in production using Kubernetes environments., Basic working experience with Microsoft Fabric, including using Fabric Data Factory pipelines for data ingestion, working with Lakehouse for centralized data storage, orchestrating data workflows, monitoring pipeline execution, and integrating Fabric with Azure services to support reliable end-to-end data processing and analytics., Ensured best practices in CI/CD by setting up Jenkins pipelines for model training and deployment., Designed and implemented scalable Snowflake data warehouse solutions to support business intelligence and analytics., Designed and implemented end-to-end data pipelines using Informatica PowerCenter, Azure Data Factory, enabling seamless integration of data from on-premise and cloud sources., Conducted performance tuning and troubleshooting of data processing jobs in Databricks, Snowflake, improving efficiency and scalability., Implemented Snowflake security best practices including role-based access control (RBAC) and data masking., Integrate data from a variety of sources, including Azure Data Lake, SQL databases, REST APIs, and third-party systems, ensuring data availability for downstream applications., Design and optimize Spark jobs for batch and streaming data processing, improving execution times and resource consumption., Implemented CI/CD pipeline in Azure Databricks using Azure DevOps, integrating Git for version control, automated testing, and deployment., Leverage Databricks notebooks for interactive data exploration, visualization, and collaborative analysis., Leverage Azure services such as Azure Storage, Azure Data Lake, Azure SQL Database, and Azure Blob Storage to store and process data efficiently., Conduct performance tuning of Spark jobs by optimizing code, partitioning data, caching, and configuring cluster parameters., Implement data partitioning strategies, manage resource allocation, and optimize Spark SQL queries to improve runtime performance and resource utilization., Worked with Azure Cosmos DB to store and manage data. Set up databases, performed basic queries, and ensured data was available and reliable., Developed a basic script in Scala to process and analyze text data. Utilized fundamental Scala concepts such as pattern matching and collection operations (e.g., map, filter)., Played the role of a content developer whose responsibilities include developing cross-industry analytics and industry-specific analytics which requires knowledge of R/PySpark, SQL, Data Modelling techniques, and Tableau., Developed interactive dashboards and reports using Tableau, integrating data from multiple sources, including Snowflake, to provide actionable business insights., Build and maintain data pipelines for data integration and processing. Data is collected from various sources, transformed into a practical format, and loaded into the appropriate data storage solution on Azure using tools such as Azure Data Factory and Azure Databricks., Identified and resolved bottlenecks in the data pipeline and optimizing the data processing algorithms., Worked on Trifacta, which is a data-wrangling software, allows you to prepare and visualize complex data.
  • Investment Services Application, Invesco, Hyderabad, Telangana, Finance, 01/01/19, 07/31/19, Data Engineer, PLSQL, SQL Server, Informatica PowerCenter, Azure Data Factory, Snowflake, Data Warehouse, SSIS, Power BI, Crystal Report, AutoSys, Service-Now, Developed interactive dashboards and reports using Tableau, integrating data from multiple sources to provide actionable business insights., Extensive experience using Snowflake, Informatica PowerCenter for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.)., Designed and implemented end-to-end ETL pipelines in Azure Data Factory, automating data ingestion, transformation, and loading processes across cloud and on-premises environments., Designed and implemented scalable Snowflake data warehouse solutions to support business intelligence and analytics., Created reusable Snowflake data models to enhance reporting and analytics capabilities., Designed, developed, and managed Autosys job flows to automate complex ETL processes, ensuring reliable scheduling and execution of data pipeline., Configured and maintained Autosys job dependencies, including setting up job hierarchies, defining conditions, and handling failure/success scenarios to maintain optimal job performance and execution., Developed and optimized complex SQL queries to extract, transform, and load (ETL) data from multiple sources into a Data Warehouse environment., Applied indexing and query optimization techniques to improve the performance of SQL queries, reducing run times for large-scale data retrievals., Worked extensively with stored procedures and views to encapsulate complex business logic and improve code maintainability and reuse., Assisted in the design and implementation of data models, ensuring that SQL scripts aligned with the business requirements and the overall data architecture., Involved in the development of ETL pipelines in Informatica PowerCenter to load data from operational databases and external systems into the Data Warehouse, adhering to dimensional modeling techniques (e.g., Star Schema, Snowflake Schema)., Provided technical support to Data Analysts and Business Intelligence (BI) teams, helping them retrieve and analyze data efficiently through optimized SQL queries and reports., Created and scheduled Power BI and Crystal reports as per Business Partner needs. Mainly worked on data sources like:- Excel, Text, MYSQL, etc., Developed interactive dashboards and reports using Power BI, integrating data from multiple sources to provide actionable business insights., Worked on Investment Applications like CRD (Order Management System), ITR (Investment Trade Router), CADIS (Data and Security Management System) etc., Developed Data Warehouse tables and ETL mappings, to load data in DW tables as a part of Business Requirements.
  • Hospitality & Food Service Industry, Wipro, Hyderabad, Telangana, Hospitality, 01/01/18, 12/31/18, Data Engineer, SSIS, Azure Data Factory, SSRS, SQL Server, Cherwell, Extensive experience using SSIS for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.)., Proficient in designing and deploying complex data workflows for large-scale data transformations, error handling, and logging to ensure smooth data migration., Strong hands-on expertise in building SSIS packages for data extraction, data cleansing, data transformation, and data loading into staging or destination tables., Ability to debug and troubleshoot SSIS packages, identifying and resolving issues like data mismatches, data quality issues, and system failures., Expertise in developing and deploying dynamic reports using SSRS/Tableau for business intelligence and data analysis, with a focus on interactive, drill-down, and parameterized reports., Experience in optimizing SSRS/Tableau reports for performance by creating efficient queries, reducing resource consumption, and implementing caching and pagination., Proficient in SQL for querying, manipulating, and managing data in relational databases like SQL Server, MySQL, and Oracle., Expertise in SQL performance tuning, including query optimization, index analysis, and execution plan interpretation., Expertise in creating views, indexes, constraints, and managing transactions to ensure data integrity, performance, and security.
  • Maritime & Logistics Industry, Wipro, Hyderabad, Telangana, Shipping Service (Transportation), 02/01/16, 12/31/17, Data Engineer, Informatica PowerCenter, PL/SQL Developer, UNIX, Service-Now, FileZilla, Extensive experience using Informatica Powercenter for automating the ETL (Extract, Transform, Load) process and integrating data from multiple sources (flat files, databases, Excel, web services, etc.)., Maintained and enhanced the back-end functionality of the application according to CRs., Generated a report (regarding the services provided by the vessel etc.), as per the requirement of the customer using SQL/PLSQL queries and visualizing it using Python., Solved the reported issue (incurred in the application) received as an Incident in Service Now., Prepared and provided status reports on a weekly basis regarding the total flow of incidents received and resolved in a week using Python in Jupyter Notebook. This report basically depicts the SLA of incidents., Wrote SQL queries to generate reports, extract data for analysis, and support decision-making., Applied JOINs, subqueries, and aggregate functions to handle complex data relationships and requirements., Developed PL/SQL procedures to automate data processing tasks and enhance data validation., Created PL/SQL functions to encapsulate reusable logic, improving code maintainability and reusability.

Timeline

Data Engineer

State Street
04.2025 - Current

Data Engineer/Azure Databricks Developer

Deloitte USI
08.2019 - 01.2025

Data Engineer

Invesco
01.2019 - 07.2019

Data Engineer

Wipro
01.2018 - 12.2018

Data Engineer

Wipro
02.2016 - 12.2017

Bachelor of Technology - Electronic and Communication Engineering

Indian Institute of Information Technology Design and Manufacturing
Anurag Tamia