Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Hi, I’m

Sai Sujith Rathamsetty

Plainfield,IL
Sai Sujith Rathamsetty

Summary

Data Engineer with 7+ years of experience designing and delivering enterprise-scale big data solutions across cloud and distributed systems. Expert in Apache Spark, PySpark, Scala, and ETL/ELT pipelines, with strong skills in AWS Glue, Lambda, S3, SNS/SQS, CloudWatch, and analytics platforms like Redshift and Snowflake. Experienced in end-to-end data architecture, CI/CD automation, and production deployment in multi-cloud environments (AWS, Azure, GCP). Skilled in SQL, data modeling, pipeline debugging, and performance optimization. Recognized for collaborating with cross-functional teams to deliver scalable, secure, and high-performance data solutions in fast-paced, regulated, and mission-critical environments.

Overview

9
years of professional experience
1
Certification

Work History

NetOrbit INC

Data Engineer intern
06.2025 - Current

Job overview

  • Assisted in designing and developing scalable big data pipelines using Spark, PySpark, and Azure Data Factory, supporting ingestion and transformation of multi-GB telemetry datasets.
  • Supported the development of automated ETL workflows integrating game and telemetry data with analytics platforms, contributing to 3+ end-to-end pipelines handling thousands of events per minute.
  • Helped implement monitoring and alerting dashboards using Azure Monitor and Log Analytics, improving visibility into pipeline health and reducing issue detection time.
  • Participated in root-cause analysis during testing and early production releases, helping resolve 10+ defects and improving pipeline reliability.
  • Collaborated with analysts and engineers to optimize SQL queries, contributing to 15–20% performance improvements in selected analytical queries.

Cognizant Technology & Solutions

Data Engineer
10.2022 - 03.2023

Job overview

  • Developed distributed data processing applications using Spark and PySpark on Databricks and GCP Dataproc, optimizing compute-heavy workloads and reducing job runtimes by 20–30%.
  • Built and maintained ELT pipelines with BigQuery and Matillion, integrating 5+ heterogeneous data sources into Snowflake and BigQuery using SQL and orchestration frameworks.
  • Implemented event-driven ingestion pipelines using Google Pub/Sub, enabling near real-time analytics for datasets processing thousands of events per minute.
  • Led QA/QC activities for data pipelines, defining data validation rules, reconciliation checks, and test cases to ensure accuracy, completeness, and consistency across batch and streaming workloads.
  • Owned data quality assurance processes, identifying and resolving 30+ data defects, improving pipeline reliability and downstream analytics confidence.
  • Coordinated on-prem to cloud migrations, validating data integrity post-migration while supporting IAM security, network configuration, and high-availability designs.
  • Automated ETL testing and execution workflows using Python and UNIX shell scripts, reducing manual QA effort by 40% and improving release consistency.
  • Integrated GitHub-based version control and CI/CD pipelines, enforcing code reviews, quality gates, and controlled deployments across dev and test environments.
  • Managed data security and compliance checks using Cloud KMS and IAM, ensuring adherence to internal security and access policies.
  • Monitored batch and streaming workloads using GCP Operations Suite, performing root-cause analysis and reducing recurring production issues.

Infosys Pvt LTD

Data Engineer(AWS)
11.2018 - 06.2022

Job overview

  • Designed and implemented scalable data storage solutions using Amazon RDS, DynamoDB, and S3 Data Lakes, improving data accessibility and overall system performance.
  • Built and maintained automated data pipelines and workflows using AWS Glue, Step Functions, and AWS Data Pipeline, processing 20M+ records daily with high reliability.
  • Performed advanced data analytics using Apache Spark on Amazon EMR and Databricks, reducing data processing time by 45% and enabling data-driven business insights.
  • Integrated real-time data processing systems using Amazon SQS and Amazon Kinesis, supporting ingestion of 5K+ events per second.
  • Automated infrastructure deployment and configuration using Jenkins, Ansible, AWS Systems Manager, and OpsWorks, reducing deployment time by 60% and minimizing configuration drift.
  • Orchestrated containerized applications using Amazon EKS to improve deployment speed and environment consistency.
  • Implemented security best practices through AWS IAM, KMS, and Secrets Manager, ensuring secure access and data protection across cloud services.
  • Engineered backup and disaster recovery solutions using Amazon S3, S3 Glacier, and AWS Backup, meeting enterprise durability and compliance requirements.
  • Monitored cloud resources and applications using CloudWatch, CloudTrail, and AWS X-Ray, reducing mean time to resolution (MTTR) by 30%.
  • Developed custom PL/SQL and SQL scripts to automate data extraction and transformation, significantly reducing manual effort.
  • Led data migrations to AWS, ensuring minimal downtime and maintaining data integrity throughout the migration process.
  • Created interactive Tableau dashboards, reducing ad-hoc reporting requests by 40% and improving stakeholder visibility.
  • Collaborated with business stakeholders, developers, and QA teams to translate requirements into scalable AWS solutions.
  • Ensured adherence to data governance and security best practices, maintaining data quality and audit readiness.
  • Conducted root cause analysis on data issues, reducing recurring incidents by 35%

Accenture Solutions Pvt LTD

Software Engineer
11.2016 - 05.2018

Job overview

  • Collaborated with GIS analysts and software engineers to develop and enhance location-based services and mapping functionalities for real-time ride-sharing applications.
  • Assisted in creating interactive maps using geospatial libraries and APIs (e.g., Leaflet.js, Mapbox, or Google Maps API) to visualize routes, drivers, and demand zones.
  • Processed and transformed large geospatial datasets (GeoJSON, shape files, CSVs) for accurate map rendering and spatial analysis.
  • Developed ETL pipelines to ingest and clean GPS and location data using Python and SQL for use in spatial visualizations and analytics.
  • Supported implementation of geofencing logic and heatmaps to visualize high-demand areas and optimize driver allocation.
  • Wrote efficient spatial queries using PostGIS (PostgreSQL with GIS extension) to perform distance calculations, reverse geocoding, and proximity analysis.
  • Worked closely with the data science team to integrate real-time location data into dashboards using Power BI or Tableau, enabling business insights on rider behavior and trip patterns.
  • Participated in Agile development, contributing to sprints, code reviews, and documentation of geospatial modules and APIs.
  • Participated in daily Agile ceremonies including sprint planning and stand-ups to contribute to iterative development cycles.

Education

Cleveland State University
Cleveland, OH

from Computer Science
12.2024

University Overview

Master’s Coursework

  • Data Mining
  • Deep Learning
  • Natural Language Processing (NLP)
  • Computer Vision
  • Distributed Systems
  • Database Management Systems
  • Computer Architecture

Graduate Teaching AssistantCleveland, OH

  • Intro to Operating Systems | Fall 2024
  • Intro to Database Systems | Summer 2024
  • Software Engineering | Spring 2024
  • Supported instruction for 100+ undergraduate students, assisting with labs, assignments, grading, and office hours
  • Guided students on SQL, normalization, indexing, concurrency, and OS fundamentals
  • Collaborated with faculty to maintain course materials and evaluate student performance

Academic Honors

  • Monte Ahuja Scholarship Recipient
  • GPA: 3.66 / 4.0

Skills

    Data Engineering & Processing

  • Apache Spark, Spark SQL
  • Apache Kafka
  • Data Warehousing (Fact/Dimension Modeling, Star/Snowflake Schema)
  • Cloud & Big Data (AWS / Azure)

  • AWS Glue, Lambda, S3, SNS, SQS, CloudWatch
  • Amazon Redshift
  • Azure Data Factory (ADF), Azure Log Analytics
  • Databases & Storage

  • Snowflake
  • Oracle, SQL Server
  • Workflow Orchestration & ETL

  • Apache Airflow
  • Informatica

    Programming & Scripting

  • Python
  • SQL
  • Shell Scripting
  • DevOps & CI/CD

  • Git, GitHub
  • Jenkins, Azure DevOps
  • Containers & Infrastructure

  • Docker
  • Kubernetes
  • Data Engineering Practices

  • Pipeline Debugging & Monitoring
  • Performance Tuning & Optimization

Certification

Microsoft Azure Fundamentals, DP-900

Languages

English
Professional Working
Hindi
Native or Bilingual
Spanish
Elementary

Timeline

Data Engineer intern
NetOrbit INC
06.2025 - Current
Data Engineer
Cognizant Technology & Solutions
10.2022 - 03.2023
Data Engineer(AWS)
Infosys Pvt LTD
11.2018 - 06.2022
Software Engineer
Accenture Solutions Pvt LTD
11.2016 - 05.2018
Cleveland State University
from Computer Science
Sai Sujith Rathamsetty