Summary
Overview
Work History
Education
Skills
Current Project Engagement
Technical Summary
Additional Information
Timeline
Generic

SARAT KUMAR SETHY

Data Science Manager,HYDERABAD

Summary

Highly competent Data Engineering and AI Manager with having 17+ years of strong cross functional and technical experience in designing, implementing business engineering, Intelligence & Data solution with 10+ years of Agile Data engineering end to end Product management experience for advertisement, banking and HealthCare platform in the data& Analytics technology space using python, Scala ,Spark, Databricks, Natural language processing, data processing, and data Mesh,GenerativeAI to solve challenging business problems. Strengthened overall pipeline performance, committed to championing a data-driven decision-making culture that meet business demand for timely focused analytics and information delivery. Hands on technologist with ability to research new technologies, aspiring to work in an enthusiastic, creative, and competitive environment that would set opportunities for enhancing and applying my skill and to contribute towards organizational growth.
Collaborative leader partners with coworkers to promote engaged, empowering work culture. Documented strengths in building and maintaining relationships with diverse range of stakeholders in dynamic, fast-paced settings.

Overview

18
18
years of professional experience

Work History

Senior Manager – Data Science and Analytics

Cognizant
08.2018 - Current
  • Leading the platform design and data migration as well as moving the spark & flink jobs from HDFS to Apple cloud storage and building the Lakehouse with unity Catalog.
  • Led the development , testing and Migration of Legacy on -perm data to Azure environment using ADF, Databricks.
  • Led the development ,testing and migration of Teradata to Snowflake using S3 and data bricks.
  • Responsible for designing, implementing, and maintaining security controls to protect data—especially sensitive and personal data—across storage, processing, and transmission layers.
  • Use Unity Catalog to manage fine-grained permissions across data assets (catalogs, schemas, tables, rows, columns) adhering to least privilege and compliance need.
  • Implement ed Data Masking, Pseudonymization, and Dynamic Views and Employ Terraform-based security templates for standardized deployments and use Databricks
  • Build and validate secure ETL/ELT pipelines in Databricks (using Delta Live Tables, Spark jobs, or SQL workflows) and Ensure input validation and output sanitization in transformations to avoid data leakage.
  • Proficient in technology leadership, architecture, designing, building teams from starch, strategy planning, project management, client/customer management along with the budget planning and CP handling.
  • Managing multiple Manager, leaders, architects, engineers and testers.
  • Crafting technology strategies vision, roadmap and leading data team to deliver solution Leveraging OnPrem, cloud and Opensource technologies.
  • Manage and drive direction from strategy, planning, prototype to execution of applying data analytics which includes data scoping, analysis, preprocessing, applying relations, deriving ML models, Optimizing the model performance & accuracy, visualization and delivering actionable insights for Banking and Insurance Product.
  • Worked and manage multiple parallel RPFs/Proposal for Data related project for banking, Communication and tech clients.
  • Working with Product team as well as played a Scrum Manager for multiple projects.
  • Playing as an individual contributor to some of the ML and AI projects.
  • Managed large-scale projects and introduced new systems, tools, and processes to achieve challenging objectives.
  • Held monthly meetings to create business plans and workshops to drive successful business.
  • Reviewed and analyzed reports, records and directives to obtain data required for planning department activities.
  • Executed appropriate staffing and budgetary plans to align with business forecasts.
  • Evaluated employee performance and conveyed constructive feedback to improve skills.
  • Planned, created, tested and deployed system life cycle methodology to produce high quality systems to meet and exceed customer expectations.
  • Implemented data linkage using Databricks Unity Catalog.
  • Implemented Databricks Unity Catalog for centralized data governance, enhancing data security and compliance across different LOBs and reducing audit preparation time by 50%

Senior lead – Data and Analytics

Wipro Technology
Hyderabad
01.2017 - 08.2018

Designed and developed data pipelines for large banking warehouse to do analytics for fraud recommendation based on customer sentiment for HSBC, UK customer
Designed and Developed data pipeline for a large scale health care platform to do analytics for medical claims for Change health Care, USA customer
Developed Modern Data Warehouse solutions using spark, Scala and Azure Stack for Change health care
Worked to ensure data quality, integrity, and security by implementing appropriate data validation, storage, and access controls
Collaborate with cross-functional teams to understand their data needs and deliver relevant and actionable insights
Prepared ETL design documents which consists of the database structure, Change data capture, Error handling , restart and refresh strategies.
Designed Data processing and transformations pipeline using spark , scala and Azure data factory.
Implemented Spark optimization techniques such as caching, multithreading, and broadcast joins, resulting ina 20% decrease in processing time for handling a daily load of around 2 Million records.

Tech lead – Data and Analytics

Honeywell R&D
Hyderabad
11.2015 - 12.2016

Worked as Data Engineer lead in Honey well Analytical products such as Scanners (Image Processing), Thermal Temperature as well as Honeywell Homes & building connects Applications
Build complex data pipeline and deployed the data into Hadoop environment over AWS as a multi cluster environment to do analytics for Honeywell IOT devices
Design, implement, and maintain data processing pipelines to aggregate, clean, and process large volumes of IOT streaming data using spark and Scala
Performed unit testing, Reconciliations and worked on ETL/ELT script process. Lead code review , shared knowledge and mentored junior resources
Teamed on development of technology roadmap spanning 2 years.

Tech lead – Data

IBM
11.2009 - 11.2015

Worked as a lead Specialist for Banking , Insurance and healthcare customer as a ETL Developer.
Built ETL pipeline using DataStage and automated the Reconciliation process using python.
Validated the data using SQL automated scripts.
Prepare functional and technical specification design documents for building Member Data Mart according into IBM BDW Banking model.
Prepared and implemented successfully automated UNIX scripts to execute the end to end history load process.
Designed integration tools to combine data from multiple, varied data sources such as RDBMS, SQL and big data installations.

Senior Software Engineer

Javi System India Pvt ltd
Bangalore
04.2008 - 11.2009

Worked as a as a ETL Developer.
Built ETL pipeline using DataStage and automated the Reconciliation process using python.
Validated the data using SQL automated scripts.

Education

B.Tech - Computer Science

Utkal University

Skills

  • Machine learning expertise
  • AI technology expertise
  • Proficient in Pandas library
  • Proficient in NumPy
  • Data visualization with Matplotlib
  • Keras framework proficiency
  • Experience with Scikit-learn library
  • Experienced with TensorFlow frameworks
  • Language processing proficiency(NLP)
  • Supervised learning
  • Unsupervised learning
  • Deep learning
  • Neural Networks
  • Text Analytics
  • Deep Reinforcement learning
  • Pytorch
  • Boosting Algorithm
  • Markov Chain Model
  • LLM
  • ChatGPT
  • Apache Spark development
  • Hadoop
  • Azure
  • Proficient in Databricks
  • Snowflake
  • Python
  • SQL
  • PySpark
  • Unix
  • Probability
  • Statistics
  • Hypothesis Testing
  • Integrations & Derivative
  • Linear Algebra
  • Exploratory Analysis
  • Gaussian Mixture & Hidden Markov Models
  • Agile/Waterfall
  • JIRA
  • Confluence
  • ALM
  • Cognos
  • Teradata/SQL Server
  • Voldemort
  • Spark

Current Project Engagement

Projects Worked - 

 CDH to Lakehouse Platform (Hadoop to ACOS migration)  July 2025 -till date

Project Description –

The intention of the platform design is to migrate off Cloudera Distributed Hadoop(CDH).This will required the entire business unit to transition from HDFS & YARN to spark data processing technologies as well as migrating hive meta store to Lakehouse unity catalog. As part of this migration journey , the plan is to move all the jobs, data and hive metastore to Lakehouse(backed by ACOS -S3 compatibility) The Lakehouse is having unity catalog with Iceberg format dataset.

Role – Platform design lead.

Tech Stack – Databricks , Spark, PySpark , Flink , Python and Hadoop, AWS S3

Responsibilities:

1. Leading the migration effort from development , testing and End to end production deployment and support

2. Design the tool and solutions development for data inductions, data linage , data copy and data replications strategy .

3. Working on building tools for various phases of migrations and hyperscale’s them as based on the volumetric of data ingestions and loads to the target platform.

4. Designing the solution as well as prototype for migrating all the YARN jobs to kube or serverless spark on kube , checking the throughput and other performance parameter.

5. Working on moving the old hive meta store to unity catalog with iceberg format.

6. Working on the job as well as ETL transformation validations to make sure spark joins, lookup and security features works properly for multiplatform and cross LOB migration effort.

7. Leading a cross-functional team to establish data management policies and best practices, resulting in improved data security and compliance.

8. Applied GDPR rules for sensitive datasets while building the pipelines.

9. Working on forward replication and backward replication strategy for data migration effort.

10. Ensuring the data pipelines are complies with data protection regulations (such as GDPR, HIPAA, CCPA) and industry standards.

11. Implemented and monitor data loss prevention tools, data reconciliation tool and strategiesto detect and prevent unauthorized data across the originations/LoBs..

12. Worked with the LoB lead to ensure that Databricks workflowsalign with GDPR, CCPA, HIPAA, and other applicable data protection regulations.

GBI Terminus -Teradata to Snowflake data migration ( June 2022- June 2024)

Project – Apple has strategically want to moved Teradata due to operational cost and modernized Hydra applications -A faster ,more flexible way to visualize the data.

As part of this migration journey , the plan is to move all the data for the critical LOBs from Teradata to Snowflake and sunset the Teradata licences and then repoint all the reports and dashboards of Power BI and tableau to Snowflake .

Role – Migration Lead .

Tech Stack – Databricks , Python , AWS S3 ,Kafka, Snowflake , Teradata.

Responsibilities –

1. Design and Build history data copy tool for Teradata to snowflake migration .

2. Design Keystone framework to handle the data induction progress from different sources using Kafka .

3. Own the platform assessment and done multiple POV to make sure the tools works.

4. Develop automated pipelines for data extraction, transformation, and loading.

5. Load data into Snowflake using Snow pipe, bulk copy,

6. Rewrite BTEQ scripts and Procedure into Snowflake SQL .

7. Convert the Teradata DDL schema to Snowflake compatible as well optimize the data model .

8. Used Databricks for data governance and ETL code building .

9. Led a cross-functional team to establish data management policies and best practices, resulting in improved data security and compliance.

10. Working on forward replication and backward replication strategy for data migration effort.

ChartDnB, Confidential, USA, D&B

ChatDnB provides capabilities of enhancing the search mechanism which helps to ask questions and get meaningful insights with relevant to the questions.,

 Python, LLM, ChatGPT, Big Query, GCP, 4,

Individual Contributor, Worked on integrating test suite into github actions for automated execution on git commits, checkout., Implemented caching and memory usages., Performance tuning at the application level., Developed integration test suite using pytest in python for the ChatDnB framework.

  • Financials Analytics enablement of Next-Gen User Experience, Confidential, USA

This is an end-to-end data and analytics platform, with new advanced analytics tools, and a suite of modular front-end solutions that would help clients derive key insights, build new strategies, and take customer-level actions that can be easily integrated into their workflows., 

Azure DW(Redshift), AWS S3, AWS SageMaker, Matrix Profiler, Bayesian Networks, AWS DynamoDB, T5, NLP, ThoughtSpot, PowerBI

Data Engineering Manager, Leading 60+ engineers to implement, develop, design and test the Azure cloud based Data Modernize applications and worked with data science team to formalize the AI based decision-making products on the Financial Data., Re-engineer and re-developed the fraud detection algorithm, introducing new attributes resulting in decreased false positive by 90%., Used Amazon Sage Maker for Data Pre-processing such as Data Merging, Cleaning and doing missing value treatment., Leverage the Matrix Profile Algorithm to scan the Multi Variable Time Series Data to identify the Anomalies in the accelerated dataset., Extract Industry Events using NLP, Classify the Text based on the Window Based Outlier., Analyze the KPI Data Store to discover any inconsistent data characteristics, and pattern due to unavailability of key driving field values., Manage and work closely with User experience team Causal Graphs in D3.js & Python Instances., Owned the project planning, delivery, work breakdowns, resource planning, and day-to-day project management using Jira, Confluence and Scrums., Partner with engineering, and Product teams to identify product and technical requirements, and lead our team through dependencies and delivery milestones., Worked as an Individual contributor on developing ML models using python as well as building data pipeline using spark and scala.

  • AI enabled Order Management of MDM Product, Confidential, USA

The main aim of the is AI enabled solution is minimize the Order fallout and predict the root cause of the fall out incidents based on the Address Product. An efficient Order fallout management system ensures that order failures are detected and corrected early for prompt provisioning of customer service. This can be achieved by developing robust AI driven Process Control, a process that evaluates and monitors performance using data collected over time. Fallout results in Customer churn, Degradation of Service Offerings, and a diminished Customer experience. Here we have considered two AIML based use cases to predict, and resolve fallouts with minimum turnaround time

Root cause Analysis based on Historical Data:

Goal:  When the Order fallout happens, AI can help in looking at the Symptoms, and predicting the Root Causes. The time taken to narrow down the root cause is a significant factor in determining resolution times. It requires expert help, usually the developers of the software or the product vendor. AI can help here by looking at the symptoms and predicting the root causes. This helps ITOps to get down to fixing the root cause quickly. Classification is a machine learning problem of identifying a set of categories to which a new observation belongs to classification. Those include simple decision trees, naive Bayes, random forests, support vector machines, and deep learning.

Predicting Root Causes with Keras:

When a new incident happens, we typically identify the symptoms of the incident first, and populate the related feature variables here, like error codes. We then pass these as an array to the model's predict class’s function. This function will return a numeric value for the root cause. We then translate the numeric value into a label using the inverse transform function on the encoder. We can use this model to predict root causes for a batch of incidents.

Tech Stack – GCP, BIGQUERY, Informatica, CoreLogic Data APIs for Address Products, Google Address Verifier APIs, Classifications Models (Random Forest, Survival Analysis, Multinomial logistic regression), Keras, PowerBI

Role – Data Science  Manager.

Leading 30+ engineers to implement , Develop, Design and test the GCP cloud based MDM applications and worked with data science team to reduce address fallout risk  using ML based Solution .Build an AI Driven Anomaly Detection to predict Errors in Service Order Management. Wherein, the ML model can gauge the Service Order Trends to determine if any item is an Error to Alert using Time Series Data with Service Order Requests, Used Python packages like SciKit-Learn for Data pre-processing activities like transforming values into an appropriate format.Analyze the KPI Data Store to discover any inconsistent data characteristics, and pattern due to unavailability of key driving field values, Manage and Worked closely with User experience team Causal Graphs in D3.jS & Python Instances,Owned the project planning, delivery, work breakdowns, resource planning, and day-to-day project management using Jira, Confluence and Scrums.Partner with engineering, and Product teams to identify product and technical requirements, and lead our team through dependencies and delivery milestones. Worked Architecting the entire solution and build a design of a single 360 polygon views of their MDM -Address products.Worked closely with the testing team to find out the automation strategy to test the Data , API, Reports and AI/ML models and their accuracy .

  • Ad Platform ( App Recommendation ), Confidential, USA

A privacy-friendly implementation of the item-item recommender system (an algorithm that learns “similarity” scores for two apps). The production model is trained on the Apple Media Prodcut server and transmits only the app-to-app similarity scores, which are user-agnostic, to Ad Platform servers. These app-to-app similarity scores would be calculated and transmitted for ALL ad-enabled storefronts on a daily/weekly basis. The main To have an user agnostic(unpersonalised) similarity score that tells us how common it is for the user to interact with a specific app and then another specific app, in the case where those two interactions happen .

Tech Stack – AWS, Spark , Scala , Statistical Model, Tableau

Role – Data Science Manager.

Responsibilities :

Worked and built the App recommendation model based on the app to app similarity using mathematical models .

Worked on EDA for user clicks data using python data analysis and pre-processing packages .

Build an MarkOv chain Model as well as mathematical logarithms models to find out the probabilistic ration( P value).

Worked with customer to understand the required and convert the requirement to model where we can get the similarity source between the two apps.

Technical Summary

LLMs and NLP, Python, PyTorch, TensorFlow, LangChain, LlamaIndex, Text Analytics, Ad Analytics, Anomaly Detection, Recommendation Engine, Statistical Modelling, Gen AI and agnostic Ai

Additional Information

  • 14 years of experience, 6+ years in providing business solution on different platforms of data analysis, Data Science and Machine learning and 5+ years of experience in AI/ML engineering, with a strong focus on LLMs and NLP and data testing for banking and Insurance product.
  • Proficiency in Python and AI frameworks such as PyTorch, TensorFlow, LangChain, LlamaIndex,
  • Experience in implementing Text Analytics, Ad Analytics, Anamology Detection and the Recommendation engine using NLP, logistic Regression, Classification, K-mean clustering and isolation forest.
  • Involved in all the phases of project life cycle including Data Acquisition, A/B Testing, Hypothesis Testing, EDA, Data Cleaning ,Data Imputations( Outliers detections , residual Analysis, PCA etc), Data transformations, feature scaling , feature engineering , Statistical Modelling for both liner and non-liner dataset. Factor analysis, testing and validations using ROC plot, F1-Score, K fold cross validations and data pattern visualization.
  • Exposure and 1 yest of experience with open-source (Llama, Mistral, Falcon, etc.) and proprietary models (GPT-4, Claude, Gemini, etc.) to build state-of-the-art AI applications
  • Excellent problem solving and data analysis skills, with expertise in developing or applying predictive analytics, statistical modeling, A/B experiments, data mining and machine learning algorithms
  • Good Experience in spark , python and Scala programming .

Timeline

Senior Manager – Data Science and Analytics

Cognizant
08.2018 - Current

Senior lead – Data and Analytics

Wipro Technology
01.2017 - 08.2018

Tech lead – Data and Analytics

Honeywell R&D
11.2015 - 12.2016

Tech lead – Data

IBM
11.2009 - 11.2015

Senior Software Engineer

Javi System India Pvt ltd
04.2008 - 11.2009

B.Tech - Computer Science

Utkal University
SARAT KUMAR SETHY