Summary
Overview
Work History
Education
Skills
Timeline
Generic

Saishashank Saraff

Dallas

Summary

Data Engineer with 2+ years of experience in the telecom sector, focused on developing scalable data pipelines utilizing AWS services including S3, Glue, and Redshift. Expertise in integrating structured and unstructured data from SQL Server and MongoDB, while optimizing ETL processes with Python and Spark. Proven track record in migrating legacy systems to cloud environments, improving data accessibility and reporting capabilities. Currently enhancing skills as a Data Engineer Intern in the AWS healthcare domain, specializing in ETL pipeline development for healthcare data.

Overview

5
5
years of professional experience

Work History

Data Engineer Intern

StaffBees solution inc
Dallas
03.2025 - 06.2025
  • Developed ETL pipelines using AWS Glue and Lambda to efficiently process healthcare data from SQL and NoSQL sources.
  • Utilized Python scripts for cleaning and transforming healthcare datasets for analysis and reporting.
  • Managed and optimized AWS RDS and DynamoDB databases to ensure data integrity and performance.
  • Analyzed healthcare data collaboratively to generate insightful reports using AWS QuickSight.
  • Maintained documentation of data processes and collaborated with cross-functional teams on project alignment.

Data Engineer

Ryna Solutions
Hyderabad
02.2020 - 07.2022
  • Built ETL pipelines using Python, AWS Glue, EMR, and Lambda to ingest telecom call detail and usage data—processing approximately 5 GB per day from SQL Server and MongoDB into S3, enabling downstream analytics.
  • Developed Spark (PySpark/Spark-SQL) jobs on AWS EMR to transform raw telecom logs, reducing data processing latency by approximately 30%; supported both batch (hourly) and near real-time ingestion.
  • Orchestrated workflows with Apache Oozie and AWS Step Functions to manage dependencies across Hadoop-to-Cloud pipelines, ensuring reliable and scalable data processing.
  • Migrated data from on-prem Hadoop systems (HDFS, Hive, Sqoop) to AWS S3/EMR, incorporating Parquet and ORC formats to reduce storage costs and increase query performance by approximately 25%.
  • Designed dimensional models (star/snowflake schemas) in Snowflake and Redshift for telecom metrics reporting, optimizing queries to support dashboards in Athena and Tableau.
  • Ensured data quality and governance through automated validation via Glue jobs, Lambda checks, AWS IAM roles, and KMS encryption, improving reliability and compliance.
  • Integrated databases like DynamoDB and PostgreSQL with AWS Glue ETL for unified telecom data access, reducing manual integration effort by approximately 40%.
  • Supported CI/CD and infrastructure automation with Git, Terraform, and AWS CodePipeline; automated deployments across AWS environments (EKS, EC2, Fargate), lowering deployment time by approximately 50%.

Education

Master of Science - Computational Science

University of Dayton
Dayton, OH
12-2024

Bachelor of Science - Computer Science

Visvesvaraya College of Engineering And Technology
Hyderabad
05-2018

Skills

AWS services: S3, Glue, Lambda

  • Big data frameworks: Hadoop, Spark

Relational databases: SQL Server, PostgreSQL

  • NoSQL databases: MongoDB, DynamoDB
  • Programming languages: Python, Java
  • Data libraries: Pandas, NumPy

Scripting language: Bash

  • ETL tools: AWS Glue, Sqoop
  • Data orchestration: Oozie, Step Functions

Version control tools: Git, Jenkins

  • CI/CD tools: AWS Code Pipeline
  • Data visualization tools: Quick Sight, Tableau
  • Project management tools: JIRA, Confluence

Timeline

Data Engineer Intern

StaffBees solution inc
03.2025 - 06.2025

Data Engineer

Ryna Solutions
02.2020 - 07.2022

Master of Science - Computational Science

University of Dayton

Bachelor of Science - Computer Science

Visvesvaraya College of Engineering And Technology