Summary
Overview
Work History
Education
Skills
Certification
Project Summary
Strengths
Timeline
Generic

Sadanand Katukuri

Houston,TX

Summary

Data Engineer with 2+ years of experience in developing scalable ETL pipelines and analyzing large datasets using AWS Glue, PySpark, and SQL. Proficient in Python programming and cloud technologies, with certifications in AWS and Azure. Skilled in applying machine learning techniques for predictive analytics and feature engineering, with hands-on experience in scikit-learn. Passionate about leveraging data to drive meaningful outcomes in healthcare and analytics domains.

Overview

3
3
years of professional experience
3
3
Certification

Work History

Cognizant (Healthfirst Client - NY, US)
07.2022 - 07.2024
  • Led the migration of SAS-based processes to AWS Cloud as part of the Extract Vendor Integration and Configuration (EVIC) Project at Healthfirst, New York
  • Designed and implemented scalable ETL workflows using AWS Glue and PySpark, ensuring accurate and efficient data extraction, transformation, and loading (ETL) into S3 buckets
  • Established rigorous data quality assurance protocols to maintain integrity across large-scale datasets
  • Streamlined deployment processes through CI/CD pipelines integrated with GitHub, enhancing operational efficiency
  • Leveraged SQL, Power BI, and Pandas to perform advanced data analysis and generate actionable insights, contributing to optimized data workflows in the healthcare domain
  • Demonstrated expertise in cloud-based data engineering, data analytics, and cross-functional collaboration to achieve impactful outcomes
  • Gathering requirements from the client
  • Interacting with client to get more insight into the project
  • Analyzing source data to understand data requirements and design data mappings
  • Communicating deliverables status to onsite and participated in periodic review meetings
  • Developed AWS glue Job using Pyspark to load Postgres data, Aurora data, Dynamo DB data(nested JSON) and loading the raw data into Spark Data Frames and validated the data
  • Developed AWS glue Job using Pyspark to generate fixed width, csv reports from multiple sources like Postgres data, Parquet files from S3-using metadata driven approach
  • Experience in writing Spark Jobs for data cleansing and transformations
  • Experience implementing batch and real-time data pipelines using AWS Services, S3, Lambda, DynamoDB, Redshift, SNS
  • Developing code for data quality checks, such as accuracy, consistency, and validity, effective data analysis as per business requirements
  • Writing SQL queries to extract data from AWS S3 buckets for analysis in AWS Athena
  • Validating the data from source to target and target to source to ensure comprehensive validating with maximum scenarios
  • Processing large scale data sets and extracting insights from extensive data sets utilizing SQL through DBeaver in accordance with business requirements
  • Utilized PySpark, AWS Glue, and DBeaver to develop ETL jobs for migrating eligibility processes to AWS Cloud
  • Collaborated with clients to gather requirements and ensure seamless migration of existing reports

ETL Developer

Cognizant
01.2022 - 04.2022
  • Designed and implemented robust ETL processes to integrate and transform healthcare data from diverse sources using Informatica PowerCenter
  • Focused on data cleaning, transformation, and integration to create high-quality datasets for downstream analytics
  • Developed efficient data workflows to support strategic decision-making and ensure compliance with business requirements
  • Demonstrated expertise in data engineering, data preprocessing, and maintaining data quality to enable actionable insights within the healthcare domain
  • Gathering requirements from the client
  • Interacting with client to get more insight into the project
  • Creating and maintaining technical documentation including mapping specifications, design documents, and testing plans
  • Communicating deliverables status and participated in periodic review meetings
  • Designing and developing ETL workflows to extract data from various source systems, transform the data according to the business rules
  • Design and develop ETL mappings using Informatica PowerCenter to extract, transform, and load data from source systems to target systems
  • Create mappings, transformations, and sessions in Informatica PowerCenter
  • Writing SQL Scripts to query and manipulate data in Oracle Database
  • Implementing data quality rules and data validations to ensure data accuracy and completeness
  • Testing and debugging ETL workflows to identify and resolve any defect

Education

Master’s - data science

University of Houston-Clear Lake
05.2026

Bachelor of Engineering -

CBIT
01.2022

Skills

  • Cloud Services: AWS Redshift, S3, Athena, EC2, Azure
  • Machine Learning: Supervised, Unsupervised
  • ML Frameworks: Scikit-learn
  • Database: PostgreSQL, Redshift
  • Programing Language: Python, SQL, PySpark
  • Version Control: GitHub, CI/CD (Jenkins)
  • IDE: PyCharm, VSCode, Jupyter
  • Operating System: Windows
  • Technical knowledge: Data Modeling, Machine Learning, Data Warehousing, ETL (AWS Glue, Informatica PowerCenter)
  • Data tools: Tableau, DBeaver, Power BI
  • Big Data: Hadoop, Spark, Pandas
  • Automation and Job Scheduling: Control -M BMC
  • Methodologies: Agile

Certification

  • AWS Certified Cloud Practitioner (2023)
  • AWS Certified Solutions Architect Associate (2024)
  • Azure Data Engineer Associate (2024)
  • Machine Learning
  • Apache Spark3 – Programming in python
  • HACKATHON – Data for social good

Project Summary

Designed and implemented scalable ETL processes for healthcare data integration across two major projects at Cognizant and Healthfirst. In the Data Integration for Healthcare Industry project, I used Informatica PowerCenter, SQL, and Oracle to integrate, transform, and clean large-scale healthcare data, ensuring high-quality datasets for downstream analytics and strategic decision-making. In the EVIC Project at Healthfirst, I led the migration of SAS-based processes to AWS Cloud, utilizing AWS Glue and PySpark to optimize ETL workflows, improve data quality, and streamline deployments through CI/CD pipelines integrated with GitHub. My work involved data preprocessing, maintaining data integrity, and leveraging tools like Power BI and Pandas to generate actionable insights, driving data-driven strategies in the healthcare sector.

Strengths

· Strong proficiency in PySpark, SQL, and data processing

· Developed a local AWS Glue setup, resulting in significant cost savings for client

· Experience with AWS Cloud Services, including Glue and S3

· Skilled in ETL development and data quality assurance

· Excellent problem-solving and communication skills

· Demonstrated ability to work in cross-functional teams

. Statistical tests and probability theory

. Feature Engineering

. Deploying models on cloud platform AWS

Timeline

Cognizant (Healthfirst Client - NY, US)
07.2022 - 07.2024

ETL Developer

Cognizant
01.2022 - 04.2022

Master’s - data science

University of Houston-Clear Lake

Bachelor of Engineering -

CBIT
Sadanand Katukuri