Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic

Jayaprakash Subramani

Cloud,AI

Summary

About Myself Leveraging my 17+ years of IT experience spanning Multiple diverse technologies from, ETL, cloud, ML, and Gen AI, I spearheaded PwC’s Gen AI Factory. Directed and scaled PwC's Gen AI Factory Pod Teams in the development of strategic Retrieval-Augmented Generation (RAG) pipelines, optimizing knowledge ingestion processes. Championed the integration of GraphRAG technology and custom plugins to automate code generation and test case creation, resulting in significant improvements in efficiency and accuracy for AI-driven workflows.

Overview

17
17
years of professional experience
1
1
Certification

Work History

PwC / Gen AI Factory
10.2023 - Current
  • Leadership in Gen AI: Led the Gen AI Factory, directing pod teams to design and implement strategic RAG pipelines for multiple clients, optimizing knowledge ingestion, and enhancing AI workflow efficiency and accuracy
  • GraphRAG Integration: Spearheaded GraphRAG integration, utilizing a transformer-based architecture for RAG knowledge retrieval
  • Exploration of Gen AI/LLM Frameworks: Championed the exploration of frameworks such as Azure Semantic Kernel
  • AWS Bedrock, Langchain, and LangGraph, leading to informed decision-making for technology stack
  • Development of Reusable Plugins: Developed reusable Gen AI plugins for automated code generation and test case creation, leveraging transfer learning and NLP techniques
  • Training and Mentorship: Provided training and mentorship to team members on new technologies and methodologies, fostering a culture of continuous learning
  • Impact: Core plugins now integral to Gen AI Factory’s offerings, ensuring consistent knowledge ingestion and improving AI development lifecycles for diverse clients.

PwC
06.2022 - 09.2023
  • Major Financial Services Client

Data Management

PwC
04.2021 - 05.2022
  • Solution: Led a team to design and implement a robust data management solution using AWS and Azure, significantly enhancing data accessibility and scalability for a major financial services client
  • Cloud Integration: Oversaw the seamless migration of on-premise data warehouses to cloud-based solutions, resulting in improved data storage, retrieval, and analysis capabilities
  • Advanced Microservices Architecture: Developed a microservices architecture with AWS Lambda and Azure
  • Functions, streamlining ETL processes and improving overall data processing efficiency
  • Performance Optimization: Applied advanced performance tuning techniques and cloud resource optimization, achieving a 5X improvement in data processing speed and substantial cost reductions
  • Data Governance Framework: Established a comprehensive data governance framework to ensure data quality, security, and compliance with industry standards
  • Strategic Impact: Delivered a scalable, high-performance data management solution that significantly improved data quality and operational efficiency, empowering the client to make faster, data-driven decisions
  • For a Major Bank:
  • AWS Solution Development: Led a diverse team of onshore and offshore data engineers to develop and implement a comprehensive AWS solution for a major bank's data management challenges
  • Cloud Migration: Migrated the bank's Spark/Big Data applications to the cloud, significantly boosting processing speed and data management efficiency
  • Innovative Accelerators: Spearheaded the development of innovative accelerators, including Generative AI for automated data quality detection and a Metadata Driven ETL Framework
  • ML-Based Performance Tuning: Implemented ML-based performance tuning to optimize ETL pipelines, leading to enhanced performance and reduced operational costs
  • Automation of Data Workflows: Automated key data workflows, reducing manual intervention and minimizing the risk of errors
  • Mainframe Technologies Transformation: Led a transformative initiative utilizing extensive knowledge of mainframe technologies
  • Custom Accelerators Development: Developed custom accelerators for converting EBCDIC data to ASCII and vice versa, ensuring data integrity during the transition
  • Cloud Migration: Migrated data management processes to a robust AWS environment, leveraging cloud computing for faster processing speeds
  • Legacy System Modernization: Successfully modernize legacy systems, enabling the bank to leverage modern technologies and improve overall operational efficiency.

Travelers / Lead Cloud & Spark Data Engineer

LTI
03.2018 - 04.2021
  • BI&A and Data Engineering Strategy: Key contributor to developing and executing BI&A and Data Engineering strategy in AWS Stack
  • Cloud Migration: Architected and strategized migration of on-prem Spark applications to AWS and Databricks platforms, boosting processing speed and efficiency
  • Core Analytic Data Products: Designed, developed, and delivered core analytic data products to support BI R&D
  • Actuarial, Product Management, and business analytics consumers
  • Resilient Applications: Designed and implemented resilient, cost-effective, highly available applications in AWS Stack
  • ETL Design and RDF XML Processing: Led solution and overall ETL design for processing RDF XML messaging data using Spark
  • Reusable Transformation Models: Built reusable transformation rules and repeatable data conversion models, reducing development effort by 30%
  • Data Lake Pipeline: Implemented a data lake pipeline, orchestrating 20 data sources, applying ETL, and creating a single Hive table with 1200 attributes and over a billion rows in under 2 hours
  • Best Practices in Big Data: Established best practices, standards, principles, guidelines, and knowledge management in the big data space
  • Deep Neural Networks and NLP: Worked on Deep Neural Networks (ANN) and NLP algorithms for text mining
  • Anomaly Detection Models: Implemented Random Cut Forest, Isolation Forest, and Deep Auto Encoder models for anomaly detection in batch and streaming data
  • Spark Programs Development: Developed Spark programs for data ingestion and transformation from DB2, Teradata, and JSON files
  • SAS to Spark/Hive Modules: Converted SAS modules to Spark/Hive, creating a unified data entity for data scientist exploration
  • Performance Tuning: Extensively worked on performance tuning of Spark/Hive components
  • Real-time Data Pipelines: Implemented Kafka-Spark streaming for real-time data pipelines
  • AWS Tools: Utilized AWS EMR and Lambda for specific data processing requirements, managing source data in S3
  • Project Management: Used Kanban, Git, and GitHub for project management and version control as project lead for
  • Workers Compensation data products
  • Automation and Infrastructure Management: Implemented bash scripts for multithreading and automation, deployed, and managed cloud infrastructure using Jenkins and Terraform
  • Technical Leadership: Played a technical leadership and mentoring role for onshore and offshore teams
  • Data Quality Frameworks: Designed and implemented data quality frameworks to ensure accuracy and consistency across data pipelines
  • Advanced Analytics Solutions: Developed advanced analytics solutions integrating machine learning models for predictive analytics and decision support
  • Scalability and Optimization: Enhanced scalability and optimized resource allocation in cloud environments to handle increasing data volumes
  • Collaboration with Stakeholders: Collaborated with cross-functional teams and stakeholders to align data engineering solutions with business objectives
  • Compliance and Security: Ensured compliance with industry standards and implemented robust security measures to protect sensitive data.

Tata Consultancy Services / JPMC
08.2016 - 02.2018
  • Data Lake Analytics: Extensively worked on solution architecture and design for Data Lake analytics implementations using Informatica and Oracle
  • Data Profiling and Analysis: Performed extensive data profiling on data sources and conducted source data analysis to create source-to-target mapping sheets
  • Complex SQL Reporting: Built complex SQL reports to assist data modelers and BI developers in solving use cases
  • Stakeholder Coordination: Coordinated with business users and stakeholders to gather business requirements and manage customer relationships
  • Product Roadmap and Workshops: Managed customer relationships, conducted product workshops, and facilitated solutions architecture reviews
  • POC Implementations: Conducted POC of Informatica products in Azure and AWS platforms using Snowflake
  • Redshift, and Google Cloud Dataflow
  • Management Meetings: Facilitated weekly deep dive and status meetings with C-level management to report project progress
  • Data Loading and Streaming: Extensively worked on loading data into HIVE tables using Spark and implemented
  • Kafka-Spark streaming for unbounded API data
  • Streaming and Batch Pipelines: Leveraged NIFI to configure streaming and batch sources for pipelining into
  • Kafka/HDFS sinks
  • Hive to Spark Transformation: Converted Hive/HQL queries into Spark transformations using Spark RDD, Scala, and
  • Python
  • Data Ingestion Utilities: Developed SQOOP import utility to load data from various RDBMS sources and developed data pipelines using Flume and Spark
  • Cloud and Big Data Tools: Proficient in Cloudera/Hortonworks distributions, AWS (S3, EC2, EMR), Microsoft Azure, and Google Cloud Dataflow
  • Web Log Analytics: Implemented web log analytics using SPLUNK, Elasticsearch/Kibana, and Grafana
  • Data Analytics: Performed data analytics using SPARK with Scala and Python APIs.

Tata Consultancy Services / Silicon Valley Bank
05.2014 - 07.2016
  • SPARK Data Analysis: Worked extensively on SPARK using both Python and Scala for data analysis
  • Web Log Analytics: Gained expertise in web log analytics using SPLUNK
  • PIG Script Development: Developed and tested optimized PIG Latin scripts for data processing
  • Data Export and Visualization: Exported analyzed data to relational databases using Sqoop for visualization and report generation
  • Data Workflow Automation: Automated data extraction from warehouses and weblogs into HIVE tables using Oozie workflows and coordinator jobs
  • Data Collection with Flume: Used Flume to collect web logs from online ad-servers and push data into HDFS
  • Data Transformation and Analysis: Loaded and transformed large datasets using Hive to compute metrics for reporting
  • Oozie Workflow Development: Developed workflows in Oozie to automate data loading and processing tasks
  • Hive Table Management: Created and managed Hive tables for data analysis to meet business requirements
  • Scrum Master Role: Facilitated Sprint Planning, Daily Scrums, Sprint Reviews, and Retrospective Meetings
  • Sprint Management: Created Task Boards and Sprint Burn Down Charts, and managed team commitments and impediments
  • Team Mentorship: Served as a coach and mentor, assisting with story selection, sizing, task definition, and adherence to best practices.

Scrum Master and Technical Lead, Business Analyst, Technical Lead

Tata Consultancy Services / Nielsen
04.2011 - 04.2014
  • Performed roles as Scrum Master

Developer

Computer Sciences Corporation, GLIC
03.2010 - 04.2011
  • Requirement Gathering and Design: Formulated requirements into design specs, prepared system specifications, and tracked project progress
  • Mainframe Tools Expertise: Proficient in TSO, ISPF/SDSF, VAGEN, Panvalet, Endeavor, Xpeditor, Abend-Aid
  • JCL Development: Created JCL and JCL PROCs using various utilities like DFSORT, FILEAID, IEBCOPY
  • IEBGENER, IEBCOMPR, and ICETOOL
  • High-Level Design Documentation: Created High-Level Design, Detailed Design, and Functional Requirement documents
  • DB2 Tools Proficiency: Experienced in SPUFI, File Manager for Db2, and QMF
  • Debugging Tools: Skilled in using XPEDITOR (CICS/Batch), Debugger, CEDF, and Trace Master for troubleshooting
  • CICS Transaction Processing: Strong experience with CICS transaction processing and DB2 application integration
  • Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
  • IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive experience with IBM Mainframe tools and techniques
  • Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
  • Test Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database analysis
  • Functional and Regression Testing: Performed functional, regression, integration, end-to-end, and system testing
  • DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query optimization, and cursor pointer functionality
  • Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
  • CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
  • Transaction Server 3.1.

Test

Infosys / BNSF
10.2008 - 03.2010
  • Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database analysis
  • Functional and Regression Testing: Performed functional, regression, integration, end-to-end, and system testing
  • CICS Transaction Processing: Strong experience with CICS transaction processing and DB2 application integration
  • DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query optimization, and cursor pointer functionality
  • Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
  • Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
  • IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive experience with IBM Mainframe tools and techniques
  • Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
  • CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
  • Transaction Server 3.1.

Infosys / DHL
01.2007 - 10.2008
  • Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
  • IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive experience with IBM Mainframe tools and techniques
  • Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
  • Test Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database analysis
  • DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query optimization, and cursor pointer functionality
  • Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
  • CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
  • Transaction Server 3.1
  • Tableau Reports: Collaborated with business users to gather requirements for building Tableau reports.

Education

Bachelor of Engineering (B.E) - Electronics and Instrumentation

Anna University
2006

Skills

  • Multiple ML Certifications by Coursera
  • Technology Expertise
  • Cloud Platforms:
  • Databricks - Spark/ML/LLM/DLT
  • AWS - EMR, Glue, Kinesis, Lambda
  • Redshift, Sagemaker
  • Azure Machine Learning Studio, Azure AI
  • Studio, Azure Data Fabric, Azure
  • Functions, Azure Service Bus, Azure Event
  • Hub
  • Gen AI / LLM Frameworks:
  • Azure Semantic Kernel
  • AWS Bedrock
  • Langchain, LangGraph, GraphRag
  • Haystack, CrewAI
  • Data Engineering / Data Science:
  • Spark (Python and Scala)
  • Various Cloud SDK’s
  • Python - Multiple Python Packages
  • ML packages in Spark / Python
  • Hive / HQL, SQL, Sqoop, Shell Scripting
  • Databases:
  • Various RDBMS Databases
  • Various NoSql Databases
  • AWS RDS / Aurora
  • Azure SQL, Redshift, Snowflake
  • Neo4J

Accomplishments

  • Gen AI + ML Innovations:
  • Advanced Accelerators: Developed sophisticated accelerators to enhance efficiency and innovation in Generative AI, improving data processing speed and accuracy
  • Automated Code Generation: Created reusable Gen AI plugins for automated code generation and test case creation, using transfer learning and NLP techniques
  • Data Lineage and Quality: Implemented data lineage detection systems and integrated Databricks Delta Live Tables for reliable data quality and governance
  • GraphRAG Integration: Led GraphRAG integration for optimized RAG knowledge retrieval, boosting AI workflow efficiency
  • Gen AI Frameworks: Evaluated frameworks like Azure Semantic Kernel, AWS
  • Bedrock, Langchain, and LangGraph to develop tailored Gen AI solutions for clients
  • Migration Success: Led the successful migration of large-scale on-premise
  • Spark/Big Data applications to AWS and Azure
  • This involved transferring terabytes of data from diverse on-premise sources and managing extensive ETL pipelines
  • Microservices Architecture: Implemented a robust microservices architecture using Azure Service Bus
  • This ensured reliable message delivery and facilitated communication between distributed systems, significantly enhancing the overall efficiency of applications
  • Spark Pipelines Expertise: Demonstrated expertise in implementing and orchestrating Spark Pipelines in AWS EMR, Databricks, and Azure HDInsight, all while maintaining a strong focus on cost efficiency
  • Achieved a remarkable 5X performance improvement and substantial cost savings for large ETL pipelines through efficient performance tuning, leveraging advanced
  • Spark statistical techniques and Azure’s scalable resources.

Certification

Azure Data Scientist Associate AWS Solutions Architect Associate Neo4J Certified Professional

Timeline

PwC / Gen AI Factory
10.2023 - Current

PwC
06.2022 - 09.2023

Data Management

PwC
04.2021 - 05.2022

Travelers / Lead Cloud & Spark Data Engineer

LTI
03.2018 - 04.2021

Tata Consultancy Services / JPMC
08.2016 - 02.2018

Tata Consultancy Services / Silicon Valley Bank
05.2014 - 07.2016

Scrum Master and Technical Lead, Business Analyst, Technical Lead

Tata Consultancy Services / Nielsen
04.2011 - 04.2014

Developer

Computer Sciences Corporation, GLIC
03.2010 - 04.2011

Test

Infosys / BNSF
10.2008 - 03.2010

Infosys / DHL
01.2007 - 10.2008

Bachelor of Engineering (B.E) - Electronics and Instrumentation

Anna University
Jayaprakash Subramani