Summary

Overview

Work History

Education

Skills

Certification

Languages

Timeline

Hi, I’m

Sai Sujith Rathamsetty

Plainfield,IL

Summary

Data Engineer with 7+ years of experience designing and delivering enterprise-scale big data solutions across cloud and distributed systems. Expert in Apache Spark, PySpark, Scala, and ETL/ELT pipelines, with strong skills in AWS Glue, Lambda, S3, SNS/SQS, CloudWatch, and analytics platforms like Redshift and Snowflake. Experienced in end-to-end data architecture, CI/CD automation, and production deployment in multi-cloud environments (AWS, Azure, GCP). Skilled in SQL, data modeling, pipeline debugging, and performance optimization. Recognized for collaborating with cross-functional teams to deliver scalable, secure, and high-performance data solutions in fast-paced, regulated, and mission-critical environments.

Overview

years of professional experience

Certification

Work History

NetOrbit INC

Data Engineer intern

06.2025 - Current

Job overview

Assisted in designing and developing scalable big data pipelines using Spark, PySpark, and Azure Data Factory, supporting ingestion and transformation of multi-GB telemetry datasets.
Supported the development of automated ETL workflows integrating game and telemetry data with analytics platforms, contributing to 3+ end-to-end pipelines handling thousands of events per minute.
Helped implement monitoring and alerting dashboards using Azure Monitor and Log Analytics, improving visibility into pipeline health and reducing issue detection time.
Participated in root-cause analysis during testing and early production releases, helping resolve 10+ defects and improving pipeline reliability.
Collaborated with analysts and engineers to optimize SQL queries, contributing to 15–20% performance improvements in selected analytical queries.

Cognizant Technology & Solutions

Data Engineer

10.2022 - 03.2023

Job overview

Developed distributed data processing applications using Spark and PySpark on Databricks and GCP Dataproc, optimizing compute-heavy workloads and reducing job runtimes by 20–30%.
Built and maintained ELT pipelines with BigQuery and Matillion, integrating 5+ heterogeneous data sources into Snowflake and BigQuery using SQL and orchestration frameworks.
Implemented event-driven ingestion pipelines using Google Pub/Sub, enabling near real-time analytics for datasets processing thousands of events per minute.
Led QA/QC activities for data pipelines, defining data validation rules, reconciliation checks, and test cases to ensure accuracy, completeness, and consistency across batch and streaming workloads.
Owned data quality assurance processes, identifying and resolving 30+ data defects, improving pipeline reliability and downstream analytics confidence.
Coordinated on-prem to cloud migrations, validating data integrity post-migration while supporting IAM security, network configuration, and high-availability designs.
Automated ETL testing and execution workflows using Python and UNIX shell scripts, reducing manual QA effort by 40% and improving release consistency.
Integrated GitHub-based version control and CI/CD pipelines, enforcing code reviews, quality gates, and controlled deployments across dev and test environments.
Managed data security and compliance checks using Cloud KMS and IAM, ensuring adherence to internal security and access policies.
Monitored batch and streaming workloads using GCP Operations Suite, performing root-cause analysis and reducing recurring production issues.

Infosys Pvt LTD

Data Engineer(AWS)

11.2018 - 06.2022

Job overview

Designed and implemented scalable data storage solutions using Amazon RDS, DynamoDB, and S3 Data Lakes, improving data accessibility and overall system performance.
Built and maintained automated data pipelines and workflows using AWS Glue, Step Functions, and AWS Data Pipeline, processing 20M+ records daily with high reliability.
Performed advanced data analytics using Apache Spark on Amazon EMR and Databricks, reducing data processing time by 45% and enabling data-driven business insights.
Integrated real-time data processing systems using Amazon SQS and Amazon Kinesis, supporting ingestion of 5K+ events per second.
Automated infrastructure deployment and configuration using Jenkins, Ansible, AWS Systems Manager, and OpsWorks, reducing deployment time by 60% and minimizing configuration drift.
Orchestrated containerized applications using Amazon EKS to improve deployment speed and environment consistency.
Implemented security best practices through AWS IAM, KMS, and Secrets Manager, ensuring secure access and data protection across cloud services.
Engineered backup and disaster recovery solutions using Amazon S3, S3 Glacier, and AWS Backup, meeting enterprise durability and compliance requirements.
Monitored cloud resources and applications using CloudWatch, CloudTrail, and AWS X-Ray, reducing mean time to resolution (MTTR) by 30%.
Developed custom PL/SQL and SQL scripts to automate data extraction and transformation, significantly reducing manual effort.
Led data migrations to AWS, ensuring minimal downtime and maintaining data integrity throughout the migration process.
Created interactive Tableau dashboards, reducing ad-hoc reporting requests by 40% and improving stakeholder visibility.
Collaborated with business stakeholders, developers, and QA teams to translate requirements into scalable AWS solutions.
Ensured adherence to data governance and security best practices, maintaining data quality and audit readiness.
Conducted root cause analysis on data issues, reducing recurring incidents by 35%

Accenture Solutions Pvt LTD

Software Engineer

11.2016 - 05.2018

Job overview

Collaborated with GIS analysts and software engineers to develop and enhance location-based services and mapping functionalities for real-time ride-sharing applications.
Assisted in creating interactive maps using geospatial libraries and APIs (e.g., Leaflet.js, Mapbox, or Google Maps API) to visualize routes, drivers, and demand zones.
Processed and transformed large geospatial datasets (GeoJSON, shape files, CSVs) for accurate map rendering and spatial analysis.
Developed ETL pipelines to ingest and clean GPS and location data using Python and SQL for use in spatial visualizations and analytics.
Supported implementation of geofencing logic and heatmaps to visualize high-demand areas and optimize driver allocation.
Wrote efficient spatial queries using PostGIS (PostgreSQL with GIS extension) to perform distance calculations, reverse geocoding, and proximity analysis.
Worked closely with the data science team to integrate real-time location data into dashboards using Power BI or Tableau, enabling business insights on rider behavior and trip patterns.
Participated in Agile development, contributing to sprints, code reviews, and documentation of geospatial modules and APIs.
Participated in daily Agile ceremonies including sprint planning and stand-ups to contribute to iterative development cycles.

Education

Cleveland State University
Cleveland, OH

from Computer Science

12.2024

University Overview

Master’s Coursework

Data Mining
Deep Learning
Natural Language Processing (NLP)
Computer Vision
Distributed Systems
Database Management Systems
Computer Architecture

Graduate Teaching Assistant — Cleveland, OH

Intro to Operating Systems | Fall 2024
Intro to Database Systems | Summer 2024
Software Engineering | Spring 2024
Supported instruction for 100+ undergraduate students, assisting with labs, assignments, grading, and office hours
Guided students on SQL, normalization, indexing, concurrency, and OS fundamentals
Collaborated with faculty to maintain course materials and evaluate student performance

Academic Honors

Monte Ahuja Scholarship Recipient
GPA: 3.66 / 4.0

Skills

Data Engineering & Processing

Apache Spark, Spark SQL
Apache Kafka
Data Warehousing (Fact/Dimension Modeling, Star/Snowflake Schema)

Cloud & Big Data (AWS / Azure)

AWS Glue, Lambda, S3, SNS, SQS, CloudWatch
Amazon Redshift
Azure Data Factory (ADF), Azure Log Analytics

Databases & Storage

Snowflake
Oracle, SQL Server

Workflow Orchestration & ETL

Apache Airflow
Informatica

Programming & Scripting

Python
SQL
Shell Scripting

DevOps & CI/CD

Git, GitHub
Jenkins, Azure DevOps

Containers & Infrastructure

Docker
Kubernetes

Data Engineering Practices

Pipeline Debugging & Monitoring
Performance Tuning & Optimization

Certification

Microsoft Azure Fundamentals, DP-900

Languages

English

Professional Working

Hindi

Native or Bilingual

Spanish

Elementary

Timeline

Data Engineer intern

NetOrbit INC

06.2025 - Current

Data Engineer

Cognizant Technology & Solutions

10.2022 - 03.2023

Data Engineer(AWS)

Infosys Pvt LTD

11.2018 - 06.2022

Software Engineer

Accenture Solutions Pvt LTD

11.2016 - 05.2018

Cleveland State University

from Computer Science

Similar Profiles

Mukesh KommineniMukesh Kommineni
Data Engineer Intern at ULTIMATES IT LLCData Engineer Intern at ULTIMATES IT LLC
WASSILA NEZZARWASSILA NEZZAR
Data Engineer Intern at CIUSSS Center, West of the Island of MontrealData Engineer Intern at CIUSSS Center, West of the Island of Montreal
Samuel SarpongSamuel Sarpong
Data Engineer Intern at Tombras GroupData Engineer Intern at Tombras Group
Sameer DevulapalliSameer Devulapalli
Data Engineer Intern at SmartBots AIData Engineer Intern at SmartBots AI

CREATE PROFILE

Summary

Overview

Work History

NetOrbit INC

Job overview

Cognizant Technology & Solutions

Job overview

Infosys Pvt LTD

Job overview

Accenture Solutions Pvt LTD

Job overview

Education

Cleveland State UniversityCleveland, OH

University Overview

Skills

Certification

Languages

Timeline

Similar Profiles

Mukesh KommineniMukesh Kommineni

WASSILA NEZZARWASSILA NEZZAR

Samuel SarpongSamuel Sarpong

Sameer DevulapalliSameer Devulapalli

Cleveland State University
Cleveland, OH