Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Srinath

Summary

  • Highly motivated, certified professional with 10+ years of experience, specializing in Enterprise Data Warehousing, Data Lakehouse, Cloud Computing, Business Intelligence, and Database Administration.
  • Experienced in working end-to-end on downstream applications, handling the engineering and administrative responsibilities of the ETL, BI, and database applications.
  • Proficient in data modeling, database design, tuning databases, identifying bottlenecks in data pipelines, and providing quick workarounds to achieve real-time analytics.
  • Experienced in developing scalable and agnostic data pipelines to manage and automate the data ingestion processes.
  • Experienced in implementing the data lakes/Delta Lakes on the S3 buckets of AWS, IBM, and Azure cloud providers.

Overview

11
11
years of professional experience
1
1
Certification

Work History

Lead Data Engineer

Memorial Sloan Kettering Cancer Center
New York
09.2020 - Current

MSK is building a MODE, which is a first-of-its-kind analytics platform aimed at expanding access to sophisticated analytics for MSKCC's clinicians, researchers, and administrators on a large scale, to translate institutional information into actionable insights, and improve patient care, hospital operations, and clinical research.

Responsibilities:

  • Responsible for the end-to-end delivery of data pipelines using a variety of technologies (AWS Glue Blueprints, DataBricks, Palantir, IBM DataStage on CPD, Python, Spark, Kubernetes, Informatica Data Integration, etc.) to process a high volume of data on a daily, or more frequent, basis.
  • Developed an Enterprise Data Lake and Delta Lake to support various use cases, including analytics, processing, storage, and reporting of rapidly changing, high volumes of data.
  • Participated in the full development life cycle (Agile Methodology) of a Hybrid Cloud Project, requiring integration with other systems, including analysis, design, programming, implementation, and support.
  • Designed and developed efficient and reusable data processing systems that drive complex applications to support high-quality backend systems, leveraging several programming languages, with a focus on Python and PySpark.
  • Implemented the reusable data pipelines using AWS Glue Blueprints, which source the data from APIs and various relational databases for building the S3 data lakes and Delta lakes, leveraging the AWS Glue workflows/jobs to load data in S3 and Redshift.
  • Developed DLT pipelines in Databricks using Autoloader to synchronize the data from the S3 bucket.
  • Automated the permissions for objects in the Unity catalog by developing a process to assign tags and permissions using an ABAC approach.
  • Designed and developed agnostic PySpark data pipelines using AWS Glue to manage and automate the operational processes. Worked with AWS Data Pipeline to configure data loads from S3 to Redshift.
  • Developed reusable ETL templates with Python and DataStage to handle data ingestion in a consistent and scalable manner across the enterprise.
  • Developed data pipelines in Palantir to synchronize the data from Cloud Foundry to Databricks, leveraging the data connector.
  • Collaborated with Data Stewards and developed standards and workflows to handle the Data Governance activities (data discovery, data profiling, data quality, classifications, etc.). On the IBM Cloud Pak for Data and Informatica MDM platform.
  • Optimized the ETL jobs coded in Python and DataStage, leveraging various techniques like multi-processing and partitioning to expedite the data ingestion run time.
  • Developed Informatica Data Integration pipelines, and created match and merge rules, survivorship, and entities in the Informatica MDM cloud to generate the master data at the organization level by sourcing the data from AWS S3.
  • Involved in performance tuning of Redshift by implementing the indexing and data distribution strategies for optimal query performance. Implemented Workload Management (WLM) to prioritize the reporting queries over the long-running queries, to provide more reliable and highly available responses from Redshift.
  • Participated in container-based deployments of ETL jobs using Docker, Docker images, Docker Hub, Docker registries, and Kubernetes.
  • Developed dashboards in Tableau by integrating with the Data Virtualization layer by implementing federated.
  • Schema architecture to deliver the Fabric use case.
  • Worked with the DevOps team to build the CI/CD pipelines for ETL jobs with Azure repositories, Helm charts, and Argo CD tools for the continuous deployment of developed solutions.

Environment:

Databricks, Palantir cloud foundry, Informatica DI, MDM, RDM, IBM Cloud Services - Cloud Pak for Data 4.0.4, Data Stage 11.3, Cloud Object Storage, BigSQL, Data Virtualization, DB2 database, Kubernetes Cluster, AWS Cloud Services - S3, EC2, AWS Glue 4.0, Lake Formation, Athena, AWS EMR Cluster, Redshift Cluster, REST API, Python 3.7.5, Spark, Tableau 10.5, Cognos, Linux, git, Azure Repos, Docker Images, Helm charts, Argo CD, Rclone, Terraform, etc.

Senior ETL/BI Engineer

Signify Health
New York
04.2017 - 08.2020

Signify Health is an innovative healthcare services and technology company specializing in "bundled payment" programs.Bundled payments are an innovative new payment model that includes financial and performance accountability in episodes of care.Episode payment programs represent an important advance in the organization and financing of health care services in both the public and private sectors.

Responsibilities:

  • Involved in multiple initiatives, worked end to end on downstream applications by setting up databases and transformed data to develop Tableau dashboards.
  • Worked with business teams to understand business reporting and analytic requirements.
  • Involve in the daily Stand ups and product meetings to analyze the new requirements for proposing the technical solutions.
  • Designed data models and developed complex ETL code through Pentaho(kettle) and AWS glue to implement the proposed technical solutions.
  • Migrated the legacy JavaScript code which performs the ETL operations from application database Mongo DB to MYSQL database by Pentaho Data Integration server.
  • Identified the bottlenecks in ETL data flow and Optimized the ETL code through techniques like parallel partitioning, multi-threading etc.
  • Implementing the slowly changing dimensions (SCD) type1 and type2 to maintain current information and history information in the warehouse tables.
  • Involved in the setting up Tableau (10.5) visualization tool from scratch and upgraded to version 2019.3.
  • Designed and developed highly interactive reports and dashboards through Pentaho and Tableau.
  • Extensively used Pentaho transformations including Row Normalizer, Row Demoralizer, Database Lookup, Database Join, Calculator, Add Sequence, Add Constants and various types of inputs and outputs for various data sources including Tables, Access, Text File, Excel and CSV file.
  • Proposed and developed scalable solution to handle row level and content level security on Tableau dashboards by extensively leveraging data blending.
  • Administered, supported and monitored MySQL, Mongo and Redshift databases by proactively resolving database issues and maintaining servers.
  • Worked closely with Release and Automation team to automate the deployment of the ETL and Tableau code to higher environments.

Environment: AWS Cloud Services-S3, EC2, Glue, Lambda, AWS EMR Cluster, Redshift Cluster, SNS, SQS, RDS, Kinesis, MySQL 5.7, Mongo DB 2.6, Pentaho Data Integration 6.1, Pentaho Reports Designer 6.1, Pentaho Schema Workbench 6.1, REST API, Tableau 10.5, Linux, Python 3.7.5 etc.

ETL/BI Developer

DELOITTE
05.2014 - 04.2017

DHS Interactive is a new visualization solution to report and track a set of strategic performance indicators across the DHS Programs. At a high level the DHS Interactive solution will contain a set of visualizations to provide stakeholders the ability to view and analyze information across the Secretary's five core strategic priorities. The below are the initiatives which i have worked from the list of DHS strategic priorities:
Projects:

1) DHS Interactive, Commonwealth of Pennsylvania [DHS] (FEB 2016-APR 2017)
2) IRS form 1095-B project, Commonwealth of Pennsylvania [DHS] (DEC 2014-JAN 2016)
3) Healthy PA, Commonwealth of Pennsylvania [DHS] (MAY 2014-DEC 2014)
Responsibilities:
• Involved in system study, analyze the requirements by meeting the client and designing the complete system.
• Created ER diagram of the data model using Erwin data modeler to transform business rules into logical model.
• Developed mappings/Reusable Objects/Transformation/mapplets by using mapping designer, transformation developer and mapplet designer in Informatica Power Center 10.1/9.1/9.01/8.6.1.
• Created reusable transformations and mapplets and used them in mappings.
• Used Informatica Power Center 10.1/9.1/9.01/8.6.1 for extraction, loading and transformation (ETL) of data in the data warehouse.
• Implemented the slowly changing dimensions (SCD) type1 and type2 to maintain current information and history information in the dimension tables.
• Created complex mappings in Power Center Designer using Aggregate, Expression, Filter, and Sequence Generator, Update Strategy, SQL, Union, Lookup, Joiner, XML Source Qualifier, Unconnected lookup transformations.
• Optimized the performance of the mappings by various tests on sources, targets and transformations. Identified the Bottlenecks, removed them and implemented performance tuning logic on targets, sources, mapping, sessions to provide maximum efficiency and
performance.
• Created E-mail notifications tasks using post-session scripts.
• Designed and developed unit test cases for system integration testing. Involved with the users in the creation of test scripts for the user acceptance testing
• Designed and developed dashboards, reports to provide the data insights to business users.
• Working on data request tickets and assisting business users to understand the quality of the data.
• Tuned performance of Informatica sessions for large data files by implementing pipeline Partitioning and increasing block size, data cache size, sequence buffer length, and target-based commit interval and resolved bottlenecks.
Environment:
Informatica Power Center 10.1/9.1/9.01/8.6.1, Oracle 11g, SQL Server 2005/2008, Oracle 11g, SQL Server 2005/2008, MS Excel, Windows XP/2003/2008,Tableau,Qlik Sense, Qlik View 11.

Education

Masters - Computer Information Systems

Wilmington University
USA, New Castle, DE

Skills

  • ETL Tools: AWS Glue, Informatica 91/90/861, Pentaho Data Integration 601, DataStage 113, DataBricks, Informatica DI, MDM, Palantir
  • Business Intelligence Reporting Tools: Pentaho Reports Designer 61, Pentaho Schema Workbench 61, IBM Cognos 1021, Pentaho Dashboard Designer
  • Data Visualization Tools: Tableau 105, Tableau 20193, Pentaho Dashboard Designer, Pentaho CTools, QlikView 90
  • Databases: Oracle (9i/10g/11g), SQL Server 2005/2008, MS Access, MySQL 57, MongoDB 26, Postgres, Redshift, DB2 115
  • OS: Unix, Linux, Windows
  • Data Modeling Tools: Erwin, Visio
  • IDE: Jupyter Notebook, Visual Studio, PyCharm, Sublime
  • Cloud Services: AWS - S3, EC2, Glue, Lambda, EMR Cluster, Redshift Cluster, SNS, SQS, RDS, Kinesis, Athena, IAM, Secrets Manager, Data Pipeline, etc; IBM Cloud Services - Cloud Pak for Data 404, Cloud Object Storage, BigSQL, Data Virtualization, Container Registry, Kubernetes Cluster
  • Programming Languages: Python 375, Unix Shell Scripting, JavaScript
  • Utilities: Bamboo, Bitbucket, Jira, git , Docker Images, Helm charts, Azure Repos, Argo CD, Rclone, Terraform, etc

Certification

Pentaho Data Integration Certified Specialist.

Timeline

Lead Data Engineer

Memorial Sloan Kettering Cancer Center
09.2020 - Current

Senior ETL/BI Engineer

Signify Health
04.2017 - 08.2020

ETL/BI Developer

DELOITTE
05.2014 - 04.2017

Masters - Computer Information Systems

Wilmington University
Srinath