Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Srinath

Summary

  • Highly motivated, certified professional with 10+ years of experience, specializing in Enterprise Data Warehousing, Data Lakehouse, Cloud Computing, Business Intelligence, and Database Administration.
  • Experienced in working end-to-end on downstream applications, handling the engineering and administrative responsibilities of ETL, BI, and database applications across cloud and on-premise environments.
  • Proficient in data modeling, database design, and performance tuning for high-volume systems on platforms such as Azure SQL Database, AWS Redshift, and Oracle. Skilled at identifying bottlenecks in data pipelines, and delivering scalable, cloud-agnostic solutions for real-time analytics and reporting.
  • Hands-on experience implementing data lakes, Delta Lakes, and Azure Data Lake Storage solutions. Developed modern, reusable data pipelines using Python, PySpark, AWS Glue, Databricks, and Azure SQL Database to manage and automate data ingestion and processing across hybrid cloud environments.

Overview

11
11
years of professional experience
1
1
Certification

Work History

Lead Data Engineer

Memorial Sloan Kettering Cancer Center
New York
09.2020 - Current

MSK is building a MODE, which is a first-of-its-kind analytics platform aimed at expanding access to sophisticated analytics for MSKCC's clinicians, researchers, and administrators on a large scale, to translate institutional information into actionable insights, and improve patient care, hospital operations, and clinical research.

Responsibilities:

  • Responsible for the end-to-end delivery of data pipelines using a variety of technologies, including AWS Glue, Databricks, Palantir, IBM DataStage on CPD, Python, Spark, Kubernetes, Informatica Data Integration, and Azure SQL Database, to process high-volume data for analytics, reporting, and operational applications.
  • Designed and implemented Enterprise Data Lake and Delta Lake solutions on AWS S3 and Azure Data Lake Storage (ADLS), integrating data from APIs, relational databases, cloud platforms, and on-premise systems.
  • Developed reusable PySpark pipelines in Databricks for data ingestion, transformation, and validation from Azure SQL Database, AWS RDS, and other sources into Delta Lake tables, leveraging Databricks AutoLoader and DLT pipelines.
  • Created reusable, scalable ETL templates using AWS Glue Blueprints, Python, and DataStage, managing and automating the ingestion of structured and semi-structured data.
  • Automated Unity Catalog object permissions using ABAC-based processes by assigning tags and permission sets through metadata-driven frameworks, integrated with AWS and Azure cloud services.
  • Built and maintained data pipelines in Palantir Foundry to synchronize data between Cloud Foundry and Databricks.
  • Collaborated closely with data stewards to define standards, workflows, and processes for data governance activities, including data discovery, profiling, quality management, and classification on IBM Cloud Pak for Data and Informatica MDM.
  • Optimized performance for ETL jobs coded in Python and DataStage through techniques like multiprocessing, partitioning, and dynamic memory management.
  • Developed Informatica MDM pipelines including match/merge rules, survivorship logic, and entity definitions to generate master data records from AWS S3 and on-premise sources.
  • Tuned AWS Redshift query and table performance by implementing indexing, data distribution strategies, and Workload Management (WLM) configurations for high-priority query routing.
  • Integrated Azure Repos and Azure SQL Managed Instances into CI/CD pipelines, deploying ETL and data engineering workloads through Helm charts and Argo CD for automated, scalable deployments.
  • Developed operational and clinical dashboards in Tableau, sourcing data from Azure SQL Databases, Data Virtualization layers, and Delta Lake tables for interactive, role-based reporting.
  • Participated in containerized, Kubernetes-based deployments for ETL services and Python applications interacting with Azure SQL, AWS Redshift, and Databricks.
  • Worked with DevOps teams to build CI/CD pipelines and automate infrastructure provisioning using Terraform, Docker Images, and Helm charts.

Environment:
Databricks, Palantir Foundry, Informatica DI/MDM, IBM Cloud Pak for Data, Azure SQL Database, Azure Data Lake Storage, AWS S3, AWS Glue, Redshift, EMR, Python, Spark, Tableau, Kubernetes, Azure Repos, Docker, Helm, Argo CD, Terraform, REST APIs, Data Virtualization, Cloud Foundry.

Senior ETL/BI Engineer

Signify Health
New York
04.2017 - 08.2020

Signify Health is an innovative healthcare services and technology company specializing in "bundled payment" programs.Bundled payments are an innovative new payment model that includes financial and performance accountability in episodes of care.Episode payment programs represent an important advance in the organization and financing of health care services in both the public and private sectors.

Responsibilities:

  • Involved in multiple initiatives, worked end to end on downstream applications by setting up databases and transformed data to develop Tableau dashboards.
  • Worked with business teams to understand business reporting and analytic requirements.
  • Involve in the daily Stand ups and product meetings to analyze the new requirements for proposing the technical solutions.
  • Designed data models and developed complex ETL code through Pentaho(kettle) and AWS glue to implement the proposed technical solutions.
  • Migrated the legacy JavaScript code which performs the ETL operations from application database Mongo DB to MYSQL database by Pentaho Data Integration server.
  • Identified the bottlenecks in ETL data flow and Optimized the ETL code through techniques like parallel partitioning, multi-threading etc.
  • Implementing the slowly changing dimensions (SCD) type1 and type2 to maintain current information and history information in the warehouse tables.
  • Involved in the setting up Tableau (10.5) visualization tool from scratch and upgraded to version 2019.3.
  • Designed and developed highly interactive reports and dashboards through Pentaho and Tableau.
  • Extensively used Pentaho transformations including Row Normalizer, Row Demoralizer, Database Lookup, Database Join, Calculator, Add Sequence, Add Constants and various types of inputs and outputs for various data sources including Tables, Access, Text File, Excel and CSV file.
  • Proposed and developed scalable solution to handle row level and content level security on Tableau dashboards by extensively leveraging data blending.
  • Administered, supported and monitored MySQL, Mongo and Redshift databases by proactively resolving database issues and maintaining servers.
  • Worked closely with Release and Automation team to automate the deployment of the ETL and Tableau code to higher environments.

Environment: AWS Cloud Services-S3, EC2, Glue, Lambda, AWS EMR Cluster, Redshift Cluster, SNS, SQS, RDS, Kinesis, MySQL 5.7, Mongo DB 2.6, Pentaho Data Integration 6.1, Pentaho Reports Designer 6.1, Pentaho Schema Workbench 6.1, REST API, Tableau 10.5, Linux, Python 3.7.5 etc.

ETL/BI Developer

DELOITTE
05.2014 - 04.2017

DHS Interactive is a new visualization solution to report and track a set of strategic performance indicators across the DHS Programs. At a high level the DHS Interactive solution will contain a set of visualizations to provide stakeholders the ability to view and analyze information across the Secretary's five core strategic priorities. The below are the initiatives which i have worked from the list of DHS strategic priorities:
Projects:

1) DHS Interactive, Commonwealth of Pennsylvania [DHS] (FEB 2016-APR 2017)
2) IRS form 1095-B project, Commonwealth of Pennsylvania [DHS] (DEC 2014-JAN 2016)
3) Healthy PA, Commonwealth of Pennsylvania [DHS] (MAY 2014-DEC 2014)
Responsibilities:
• Involved in system study, analyze the requirements by meeting the client and designing the complete system.
• Created ER diagram of the data model using Erwin data modeler to transform business rules into logical model.
• Developed mappings/Reusable Objects/Transformation/mapplets by using mapping designer, transformation developer and mapplet designer in Informatica Power Center 10.1/9.1/9.01/8.6.1.
• Created reusable transformations and mapplets and used them in mappings.
• Used Informatica Power Center 10.1/9.1/9.01/8.6.1 for extraction, loading and transformation (ETL) of data in the data warehouse.
• Implemented the slowly changing dimensions (SCD) type1 and type2 to maintain current information and history information in the dimension tables.
• Created complex mappings in Power Center Designer using Aggregate, Expression, Filter, and Sequence Generator, Update Strategy, SQL, Union, Lookup, Joiner, XML Source Qualifier, Unconnected lookup transformations.
• Optimized the performance of the mappings by various tests on sources, targets and transformations. Identified the Bottlenecks, removed them and implemented performance tuning logic on targets, sources, mapping, sessions to provide maximum efficiency and
performance.
• Created E-mail notifications tasks using post-session scripts.
• Designed and developed unit test cases for system integration testing. Involved with the users in the creation of test scripts for the user acceptance testing
• Designed and developed dashboards, reports to provide the data insights to business users.
• Working on data request tickets and assisting business users to understand the quality of the data.
• Tuned performance of Informatica sessions for large data files by implementing pipeline Partitioning and increasing block size, data cache size, sequence buffer length, and target-based commit interval and resolved bottlenecks.
Environment:
Informatica Power Center 10.1/9.1/9.01/8.6.1, Oracle 11g, SQL Server 2005/2008, Oracle 11g, SQL Server 2005/2008, MS Excel, Windows XP/2003/2008,Tableau,Qlik Sense, Qlik View 11.

Education

Masters - Computer Information Systems

Wilmington University
USA, New Castle, DE

Skills

  • ETL Tools: AWS Glue, Informatica 91/90/861, Pentaho Data Integration 601, DataStage 113, DataBricks, Informatica DI, MDM, Palantir
  • Business Intelligence Reporting Tools: Pentaho Reports Designer 61, Pentaho Schema Workbench 61, IBM Cognos 1021, Pentaho Dashboard Designer
  • Data Visualization Tools: Tableau 105, Tableau 20193, Pentaho Dashboard Designer, Pentaho CTools, QlikView 90
  • Databases: Oracle (9i/10g/11g), SQL Server 2005/2008, MS Access, MySQL 57, MongoDB 26, Postgres, Redshift, DB2 115,Azure SQL Database
  • OS: Unix, Linux, Windows
  • Data Modeling Tools: Erwin, Visio
  • IDE: Jupyter Notebook, Visual Studio, PyCharm, Sublime
  • Cloud Services: AWS - S3, EC2, Glue, Lambda, EMR Cluster, Redshift Cluster, SNS, SQS, RDS, Kinesis, Athena, IAM, Secrets Manager, Data Pipeline, etc; IBM Cloud Services - Cloud Pak for Data 404, Cloud Object Storage, BigSQL, Data Virtualization, Container Registry, Kubernetes Cluster,Azure SQL Database, Azure Data Lake Storage
  • Programming Languages: Python 375, Unix Shell Scripting, JavaScript
  • Utilities: Bamboo, Bitbucket, Jira, git , Docker Images, Helm charts, Azure Repos, Argo CD, Rclone, Terraform, etc

Certification

Pentaho Data Integration Certified Specialist.

Databricks Certified Data Engineer Associate.

Timeline

Lead Data Engineer

Memorial Sloan Kettering Cancer Center
09.2020 - Current

Senior ETL/BI Engineer

Signify Health
04.2017 - 08.2020

ETL/BI Developer

DELOITTE
05.2014 - 04.2017

Masters - Computer Information Systems

Wilmington University
Srinath