Summary

Overview

Work History

Education

Skills

Certification

Project Summary

Strengths

Timeline

Sadanand Katukuri

Houston,TX

Summary

Data Engineer with 2+ years of experience in developing scalable ETL pipelines and analyzing large datasets using AWS Glue, PySpark, and SQL. Proficient in Python programming and cloud technologies, with certifications in AWS and Azure. Skilled in applying machine learning techniques for predictive analytics and feature engineering, with hands-on experience in scikit-learn. Passionate about leveraging data to drive meaningful outcomes in healthcare and analytics domains.

Overview

years of professional experience

Certification

Work History

Cognizant (Healthfirst Client - NY, US)

07.2022 - 07.2024

Led the migration of SAS-based processes to AWS Cloud as part of the Extract Vendor Integration and Configuration (EVIC) Project at Healthfirst, New York
Designed and implemented scalable ETL workflows using AWS Glue and PySpark, ensuring accurate and efficient data extraction, transformation, and loading (ETL) into S3 buckets
Established rigorous data quality assurance protocols to maintain integrity across large-scale datasets
Streamlined deployment processes through CI/CD pipelines integrated with GitHub, enhancing operational efficiency
Leveraged SQL, Power BI, and Pandas to perform advanced data analysis and generate actionable insights, contributing to optimized data workflows in the healthcare domain
Demonstrated expertise in cloud-based data engineering, data analytics, and cross-functional collaboration to achieve impactful outcomes
Gathering requirements from the client
Interacting with client to get more insight into the project
Analyzing source data to understand data requirements and design data mappings
Communicating deliverables status to onsite and participated in periodic review meetings
Developed AWS glue Job using Pyspark to load Postgres data, Aurora data, Dynamo DB data(nested JSON) and loading the raw data into Spark Data Frames and validated the data
Developed AWS glue Job using Pyspark to generate fixed width, csv reports from multiple sources like Postgres data, Parquet files from S3-using metadata driven approach
Experience in writing Spark Jobs for data cleansing and transformations
Experience implementing batch and real-time data pipelines using AWS Services, S3, Lambda, DynamoDB, Redshift, SNS
Developing code for data quality checks, such as accuracy, consistency, and validity, effective data analysis as per business requirements
Writing SQL queries to extract data from AWS S3 buckets for analysis in AWS Athena
Validating the data from source to target and target to source to ensure comprehensive validating with maximum scenarios
Processing large scale data sets and extracting insights from extensive data sets utilizing SQL through DBeaver in accordance with business requirements
Utilized PySpark, AWS Glue, and DBeaver to develop ETL jobs for migrating eligibility processes to AWS Cloud
Collaborated with clients to gather requirements and ensure seamless migration of existing reports

ETL Developer

Cognizant

01.2022 - 04.2022

Designed and implemented robust ETL processes to integrate and transform healthcare data from diverse sources using Informatica PowerCenter
Focused on data cleaning, transformation, and integration to create high-quality datasets for downstream analytics
Developed efficient data workflows to support strategic decision-making and ensure compliance with business requirements
Demonstrated expertise in data engineering, data preprocessing, and maintaining data quality to enable actionable insights within the healthcare domain
Gathering requirements from the client
Interacting with client to get more insight into the project
Creating and maintaining technical documentation including mapping specifications, design documents, and testing plans
Communicating deliverables status and participated in periodic review meetings
Designing and developing ETL workflows to extract data from various source systems, transform the data according to the business rules
Design and develop ETL mappings using Informatica PowerCenter to extract, transform, and load data from source systems to target systems
Create mappings, transformations, and sessions in Informatica PowerCenter
Writing SQL Scripts to query and manipulate data in Oracle Database
Implementing data quality rules and data validations to ensure data accuracy and completeness
Testing and debugging ETL workflows to identify and resolve any defect

Education

Master’s - data science

University of Houston-Clear Lake

05.2026

Bachelor of Engineering -

CBIT

01.2022

Skills

Cloud Services: AWS Redshift, S3, Athena, EC2, Azure
Machine Learning: Supervised, Unsupervised
ML Frameworks: Scikit-learn
Database: PostgreSQL, Redshift
Programing Language: Python, SQL, PySpark
Version Control: GitHub, CI/CD (Jenkins)
IDE: PyCharm, VSCode, Jupyter

Operating System: Windows
Technical knowledge: Data Modeling, Machine Learning, Data Warehousing, ETL (AWS Glue, Informatica PowerCenter)
Data tools: Tableau, DBeaver, Power BI
Big Data: Hadoop, Spark, Pandas
Automation and Job Scheduling: Control -M BMC
Methodologies: Agile

Certification

AWS Certified Cloud Practitioner (2023)
AWS Certified Solutions Architect Associate (2024)
Azure Data Engineer Associate (2024)
Machine Learning
Apache Spark3 – Programming in python
HACKATHON – Data for social good

Project Summary

Designed and implemented scalable ETL processes for healthcare data integration across two major projects at Cognizant and Healthfirst. In the Data Integration for Healthcare Industry project, I used Informatica PowerCenter, SQL, and Oracle to integrate, transform, and clean large-scale healthcare data, ensuring high-quality datasets for downstream analytics and strategic decision-making. In the EVIC Project at Healthfirst, I led the migration of SAS-based processes to AWS Cloud, utilizing AWS Glue and PySpark to optimize ETL workflows, improve data quality, and streamline deployments through CI/CD pipelines integrated with GitHub. My work involved data preprocessing, maintaining data integrity, and leveraging tools like Power BI and Pandas to generate actionable insights, driving data-driven strategies in the healthcare sector.

Strengths

· Strong proficiency in PySpark, SQL, and data processing

· Developed a local AWS Glue setup, resulting in significant cost savings for client

· Experience with AWS Cloud Services, including Glue and S3

· Skilled in ETL development and data quality assurance

· Excellent problem-solving and communication skills

· Demonstrated ability to work in cross-functional teams

. Statistical tests and probability theory

. Feature Engineering

. Deploying models on cloud platform AWS

Timeline

Cognizant (Healthfirst Client - NY, US)

07.2022 - 07.2024

ETL Developer

Cognizant

01.2022 - 04.2022

Master’s - data science

University of Houston-Clear Lake

Bachelor of Engineering -

CBIT

Sadanand Katukuri

Summary

Overview

Work History

ETL Developer

Education

Master’s - data science

Bachelor of Engineering -

Skills

Certification

Project Summary

Strengths

Timeline

ETL Developer

Master’s - data science

Bachelor of Engineering -

Similar Profiles

Aisha ReduxAisha Redux

Erica SpencerErica Spencer

Gihan HanwellaGihan Hanwella

Joshua RamonJoshua Ramon

Deanna CastellanoDeanna Castellano