About Myself
Leveraging my 17+ years of IT experience spanning Multiple diverse technologies
from, ETL, cloud, ML, and Gen AI, I spearheaded PwC’s Gen AI Factory. Directed
and scaled PwC's Gen AI Factory Pod Teams in the development of strategic
Retrieval-Augmented Generation (RAG) pipelines, optimizing knowledge ingestion
processes. Championed the integration of GraphRAG technology and custom
plugins to automate code generation and test case creation, resulting in significant
improvements in efficiency and accuracy for AI-driven workflows.
Overview
17
17
years of professional experience
1
1
Certification
Work History
PwC / Gen AI Factory
10.2023 - Current
Leadership in Gen AI: Led the Gen AI Factory, directing pod teams to design and implement strategic RAG pipelines
for multiple clients, optimizing knowledge ingestion, and enhancing AI workflow efficiency and accuracy
GraphRAG Integration: Spearheaded GraphRAG integration, utilizing a transformer-based architecture for RAG
knowledge retrieval
Exploration of Gen AI/LLM Frameworks: Championed the exploration of frameworks such as Azure Semantic Kernel
AWS Bedrock, Langchain, and LangGraph, leading to informed decision-making for technology stack
Development of Reusable Plugins: Developed reusable Gen AI plugins for automated code generation and test case
creation, leveraging transfer learning and NLP techniques
Training and Mentorship: Provided training and mentorship to team members on new technologies and
methodologies, fostering a culture of continuous learning
Impact: Core plugins now integral to Gen AI Factory’s offerings, ensuring consistent knowledge ingestion and
improving AI development lifecycles for diverse clients.
PwC
06.2022 - 09.2023
Major Financial Services Client
Data Management
PwC
04.2021 - 05.2022
Solution: Led a team to design and implement a robust data management solution
using AWS and Azure, significantly enhancing data accessibility and scalability for a major financial services client
Cloud Integration: Oversaw the seamless migration of on-premise data warehouses to cloud-based solutions, resulting
in improved data storage, retrieval, and analysis capabilities
Advanced Microservices Architecture: Developed a microservices architecture with AWS Lambda and Azure
Functions, streamlining ETL processes and improving overall data processing efficiency
Performance Optimization: Applied advanced performance tuning techniques and cloud resource optimization,
achieving a 5X improvement in data processing speed and substantial cost reductions
Data Governance Framework: Established a comprehensive data governance framework to ensure data quality,
security, and compliance with industry standards
Strategic Impact: Delivered a scalable, high-performance data management solution that significantly improved data
quality and operational efficiency, empowering the client to make faster, data-driven decisions
For a Major Bank:
AWS Solution Development: Led a diverse team of onshore and offshore data engineers to develop and implement a
comprehensive AWS solution for a major bank's data management challenges
Cloud Migration: Migrated the bank's Spark/Big Data applications to the cloud, significantly boosting processing
speed and data management efficiency
Innovative Accelerators: Spearheaded the development of innovative accelerators, including Generative AI for
automated data quality detection and a Metadata Driven ETL Framework
ML-Based Performance Tuning: Implemented ML-based performance tuning to optimize ETL pipelines, leading to
enhanced performance and reduced operational costs
Automation of Data Workflows: Automated key data workflows, reducing manual intervention and minimizing the
risk of errors
Mainframe Technologies Transformation: Led a transformative initiative utilizing extensive knowledge of mainframe
technologies
Custom Accelerators Development: Developed custom accelerators for converting EBCDIC data to ASCII and vice
versa, ensuring data integrity during the transition
Cloud Migration: Migrated data management processes to a robust AWS environment, leveraging cloud computing for
faster processing speeds
Legacy System Modernization: Successfully modernize legacy systems, enabling the bank to leverage modern
technologies and improve overall operational efficiency.
Travelers / Lead Cloud & Spark Data Engineer
LTI
03.2018 - 04.2021
BI&A and Data Engineering Strategy: Key contributor to developing and executing BI&A and Data Engineering
strategy in AWS Stack
Cloud Migration: Architected and strategized migration of on-prem Spark applications to AWS and Databricks
platforms, boosting processing speed and efficiency
Core Analytic Data Products: Designed, developed, and delivered core analytic data products to support BI R&D
Actuarial, Product Management, and business analytics consumers
Resilient Applications: Designed and implemented resilient, cost-effective, highly available applications in AWS Stack
ETL Design and RDF XML Processing: Led solution and overall ETL design for processing RDF XML messaging data
using Spark
Reusable Transformation Models: Built reusable transformation rules and repeatable data conversion models,
reducing development effort by 30%
Data Lake Pipeline: Implemented a data lake pipeline, orchestrating 20 data sources, applying ETL, and creating a
single Hive table with 1200 attributes and over a billion rows in under 2 hours
Best Practices in Big Data: Established best practices, standards, principles, guidelines, and knowledge management
in the big data space
Deep Neural Networks and NLP: Worked on Deep Neural Networks (ANN) and NLP algorithms for text mining
Anomaly Detection Models: Implemented Random Cut Forest, Isolation Forest, and Deep Auto Encoder models for
anomaly detection in batch and streaming data
Spark Programs Development: Developed Spark programs for data ingestion and transformation from DB2, Teradata,
and JSON files
SAS to Spark/Hive Modules: Converted SAS modules to Spark/Hive, creating a unified data entity for data scientist
exploration
Performance Tuning: Extensively worked on performance tuning of Spark/Hive components
Real-time Data Pipelines: Implemented Kafka-Spark streaming for real-time data pipelines
AWS Tools: Utilized AWS EMR and Lambda for specific data processing requirements, managing source data in S3
Project Management: Used Kanban, Git, and GitHub for project management and version control as project lead for
Workers Compensation data products
Automation and Infrastructure Management: Implemented bash scripts for multithreading and automation, deployed,
and managed cloud infrastructure using Jenkins and Terraform
Technical Leadership: Played a technical leadership and mentoring role for onshore and offshore teams
Data Quality Frameworks: Designed and implemented data quality frameworks to ensure accuracy and consistency
across data pipelines
Advanced Analytics Solutions: Developed advanced analytics solutions integrating machine learning models for
predictive analytics and decision support
Scalability and Optimization: Enhanced scalability and optimized resource allocation in cloud environments to handle
increasing data volumes
Collaboration with Stakeholders: Collaborated with cross-functional teams and stakeholders to align data engineering
solutions with business objectives
Compliance and Security: Ensured compliance with industry standards and implemented robust security measures to
protect sensitive data.
Tata Consultancy Services / JPMC
08.2016 - 02.2018
Data Lake Analytics: Extensively worked on solution architecture and design for Data Lake analytics implementations
using Informatica and Oracle
Data Profiling and Analysis: Performed extensive data profiling on data sources and conducted source data analysis
to create source-to-target mapping sheets
Complex SQL Reporting: Built complex SQL reports to assist data modelers and BI developers in solving use cases
Stakeholder Coordination: Coordinated with business users and stakeholders to gather business requirements and
manage customer relationships
Product Roadmap and Workshops: Managed customer relationships, conducted product workshops, and facilitated
solutions architecture reviews
POC Implementations: Conducted POC of Informatica products in Azure and AWS platforms using Snowflake
Redshift, and Google Cloud Dataflow
Management Meetings: Facilitated weekly deep dive and status meetings with C-level management to report project
progress
Data Loading and Streaming: Extensively worked on loading data into HIVE tables using Spark and implemented
Kafka-Spark streaming for unbounded API data
Streaming and Batch Pipelines: Leveraged NIFI to configure streaming and batch sources for pipelining into
Kafka/HDFS sinks
Hive to Spark Transformation: Converted Hive/HQL queries into Spark transformations using Spark RDD, Scala, and
Python
Data Ingestion Utilities: Developed SQOOP import utility to load data from various RDBMS sources and developed
data pipelines using Flume and Spark
Cloud and Big Data Tools: Proficient in Cloudera/Hortonworks distributions, AWS (S3, EC2, EMR), Microsoft Azure,
and Google Cloud Dataflow
Web Log Analytics: Implemented web log analytics using SPLUNK, Elasticsearch/Kibana, and Grafana
Data Analytics: Performed data analytics using SPARK with Scala and Python APIs.
Tata Consultancy Services / Silicon Valley Bank
05.2014 - 07.2016
SPARK Data Analysis: Worked extensively on SPARK using both Python and Scala for data analysis
Web Log Analytics: Gained expertise in web log analytics using SPLUNK
PIG Script Development: Developed and tested optimized PIG Latin scripts for data processing
Data Export and Visualization: Exported analyzed data to relational databases using Sqoop for visualization and
report generation
Data Workflow Automation: Automated data extraction from warehouses and weblogs into HIVE tables using Oozie
workflows and coordinator jobs
Data Collection with Flume: Used Flume to collect web logs from online ad-servers and push data into HDFS
Data Transformation and Analysis: Loaded and transformed large datasets using Hive to compute metrics for
reporting
Oozie Workflow Development: Developed workflows in Oozie to automate data loading and processing tasks
Hive Table Management: Created and managed Hive tables for data analysis to meet business requirements
JCL Development: Created JCL and JCL PROCs using various utilities like DFSORT, FILEAID, IEBCOPY
IEBGENER, IEBCOMPR, and ICETOOL
High-Level Design Documentation: Created High-Level Design, Detailed Design, and Functional Requirement
documents
DB2 Tools Proficiency: Experienced in SPUFI, File Manager for Db2, and QMF
Debugging Tools: Skilled in using XPEDITOR (CICS/Batch), Debugger, CEDF, and Trace Master for troubleshooting
CICS Transaction Processing: Strong experience with CICS transaction processing and DB2 application integration
Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive
experience with IBM Mainframe tools and techniques
Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
Test Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database
analysis
Functional and Regression Testing: Performed functional, regression, integration, end-to-end, and system testing
DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query
optimization, and cursor pointer functionality
Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
Transaction Server 3.1.
Test
Infosys / BNSF
10.2008 - 03.2010
Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database
analysis
Functional and Regression Testing: Performed functional, regression, integration, end-to-end, and system testing
CICS Transaction Processing: Strong experience with CICS transaction processing and DB2 application integration
DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query
optimization, and cursor pointer functionality
Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive
experience with IBM Mainframe tools and techniques
Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
Transaction Server 3.1.
Infosys / DHL
01.2007 - 10.2008
Manual and Automated Testing: Advanced knowledge of manual, automated, and performance testing
IBM Mainframes: In-depth knowledge of IBM mainframes (MVS, COBOL, JCL, VSAM, CICS, and DB2) and extensive
experience with IBM Mainframe tools and techniques
Test Strategy and Traceability: Involved in preparing Test Strategy and Traceability Matrix documents
Test Case Preparation: Prepared detailed test cases for batch jobs and CICS screens based on code and database
analysis
DB2 Application Development: Developed DB2 applications using cursors (Declare, Open, Fetch), SQL query
optimization, and cursor pointer functionality
Cobol-VSAM Development: Developed Cobol-VSAM applications with KSDS and ESDS clusters
CICS Web Services: Developed new inbound/outbound programs in CICS Web Services environment using CICS
Transaction Server 3.1
Tableau Reports: Collaborated with business users to gather requirements for building Tableau reports.
Education
Bachelor of Engineering (B.E) - Electronics and Instrumentation
Anna University
2006
Skills
Multiple ML Certifications by Coursera
Technology Expertise
Cloud Platforms:
Databricks - Spark/ML/LLM/DLT
AWS - EMR, Glue, Kinesis, Lambda
Redshift, Sagemaker
Azure Machine Learning Studio, Azure AI
Studio, Azure Data Fabric, Azure
Functions, Azure Service Bus, Azure Event
Hub
Gen AI / LLM Frameworks:
Azure Semantic Kernel
AWS Bedrock
Langchain, LangGraph, GraphRag
Haystack, CrewAI
Data Engineering / Data Science:
Spark (Python and Scala)
Various Cloud SDK’s
Python - Multiple Python Packages
ML packages in Spark / Python
Hive / HQL, SQL, Sqoop, Shell Scripting
Databases:
Various RDBMS Databases
Various NoSql Databases
AWS RDS / Aurora
Azure SQL, Redshift, Snowflake
Neo4J
Accomplishments
Gen AI + ML Innovations:
Advanced Accelerators: Developed sophisticated accelerators to enhance
efficiency and innovation in Generative AI, improving data processing speed
and accuracy
Automated Code Generation: Created reusable Gen AI plugins for automated
code generation and test case creation, using transfer learning and NLP
techniques
Data Lineage and Quality: Implemented data lineage detection systems and
integrated Databricks Delta Live Tables for reliable data quality and
governance
GraphRAG Integration: Led GraphRAG integration for optimized RAG
knowledge retrieval, boosting AI workflow efficiency
Gen AI Frameworks: Evaluated frameworks like Azure Semantic Kernel, AWS
Bedrock, Langchain, and LangGraph to develop tailored Gen AI solutions for
clients
Migration Success: Led the successful migration of large-scale on-premise
Spark/Big Data applications to AWS and Azure
This involved transferring
terabytes of data from diverse on-premise sources and managing extensive ETL
pipelines
Microservices Architecture: Implemented a robust microservices architecture
using Azure Service Bus
This ensured reliable message delivery and facilitated
communication between distributed systems, significantly enhancing the overall
efficiency of applications
Spark Pipelines Expertise: Demonstrated expertise in implementing and
orchestrating Spark Pipelines in AWS EMR, Databricks, and Azure HDInsight, all
while maintaining a strong focus on cost efficiency
Achieved a remarkable 5X performance improvement and substantial cost savings
for large ETL pipelines through efficient performance tuning, leveraging advanced
Spark statistical techniques and Azure’s scalable resources.
Certification
Azure Data Scientist Associate
AWS Solutions Architect Associate
Neo4J Certified Professional
Timeline
PwC / Gen AI Factory
10.2023 - Current
PwC
06.2022 - 09.2023
Data Management
PwC
04.2021 - 05.2022
Travelers / Lead Cloud & Spark Data Engineer
LTI
03.2018 - 04.2021
Tata Consultancy Services / JPMC
08.2016 - 02.2018
Tata Consultancy Services / Silicon Valley Bank
05.2014 - 07.2016
Scrum Master and Technical Lead, Business Analyst, Technical Lead
Tata Consultancy Services / Nielsen
04.2011 - 04.2014
Developer
Computer Sciences Corporation, GLIC
03.2010 - 04.2011
Test
Infosys / BNSF
10.2008 - 03.2010
Infosys / DHL
01.2007 - 10.2008
Bachelor of Engineering (B.E) - Electronics and Instrumentation
Lead for Gen AI, Collections/Recovery NA at Citigroup – Gen AI Back Office Operations, Collections/Recovery, Risk & Conversational AI (TATA Consultancy Services)Lead for Gen AI, Collections/Recovery NA at Citigroup – Gen AI Back Office Operations, Collections/Recovery, Risk & Conversational AI (TATA Consultancy Services)