Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

SAINATH MANDADI

Irving

Summary

Highly skilled Data Engineer with 5+ years of experience in designing, implementing, and optimizing data pipelines and workflows using advanced Big Data technologies. Proficient in Hadoop ecosystem tools, cloud-based data solutions (AWS, Azure), and real-time data processing. Expertise in programming with Python, Scala, Java, and SQL, complemented by a strong background in data warehousing, ETL, and analytics. Adept at leveraging Agile practices to deliver scalable and efficient data solutions, ensuring compliance and integrity across diverse technological environments.

Overview

5
5
years of professional experience

Work History

Data Engineer

Elevance
Indianapolis
12.2023 - Current
  • Migrated on-premises data solutions to AWS, implementing comprehensive ETL processes using AWS Glue, optimizing workflows with Apache Airflow, and managing large-scale datasets in S3.
  • Designed and developed robust and scalable data pipelines using Apache Spark, PySpark, and Scala, processing diverse data formats such as Avro, Parquet, and JSON for real-time and batch workloads.
  • Architected real-time data streaming applications with Kafka and integrated them with machine learning models to improve enterprise-wide analytics and operational decision-making.
  • Implemented advanced analytics solutions including AI-driven frameworks to enhance data accuracy and streamline complex workflows for high-volume processing environments.
  • Enhanced metadata management using Collibra, enabling detailed data lineage tracking, compliance auditing, and operational monitoring for enterprise data governance.
  • Developed data warehousing and visualization solutions with Snowflake, AWS Redshift, and Tableau, ensuring high performance, scalability, and actionable insights across multiple business units.
  • Established automated data validation scripts to monitor and improve data integrity, leveraging custom scripts integrated with pipelines to ensure compliance with quality standards.
  • Tools & Environment: Hadoop, Spark, PySpark, Scala, AWS Glue, S3, EMR, Redshift, Airflow, Kafka, Collibra, Snowflake, Tableau, Avro, Parquet, JSON, custom validation scripts.

Data Engineer

US Bank
Irving
11.2022 - 12.2023
  • Designed and developed CI/CD pipelines leveraging AWS CodePipeline, Glue, and Databricks, automating deployment processes for large-scale data engineering workflows across cloud environments.
  • Created and optimized real-time data processing systems using Apache Spark Streaming and Scala, integrating with Kafka and JMS to handle high-throughput, low-latency data streams efficiently.
  • Migrated analytics platforms and data warehouses from Azure to AWS, employing advanced strategies to optimize cost, scalability, and performance for enterprise data solutions.
  • Built scalable infrastructure using AWS services such as EC2, S3, and CloudFormation, implementing Infrastructure as Code (IaC) for consistent, automated deployments and environment configuration.
  • Orchestrated end-to-end cloud data solutions using Azure Databricks, Spark, and Python to process large datasets seamlessly and enable efficient data engineering pipelines.
  • Designed robust testing frameworks for validating data integrity, application performance, and compliance with security protocols across pipeline and infrastructure layers.
  • Automated complex ETL workflows using AWS Glue and PySpark, optimizing transformations and accelerating the time to insights for business-critical operations.
  • Tools & Environment: Spark Streaming, PySpark, Scala, AWS (Glue, EC2, S3, CloudFormation, CodePipeline), Azure (Databricks, Data Factory), Kafka, JMS, Python, advanced test frameworks.

Data Engineer

Allstate Solutions Private Limited
India
05.2020 - 06.2022
  • Migrated legacy SQL and Hive workflows into Spark-based transformations using Spark RDDs and Scala, significantly improving processing speed and query efficiency across distributed environments.
  • Designed and deployed proof-of-concept (PoC) projects on Yarn clusters to compare performance and validate Spark's efficiency relative to Hive and traditional SQL/Teradata operations.
  • Automated data ingestion workflows from FTP servers to Hive tables using Oozie, ensuring smooth and reliable integration with downstream ETL processes.
  • Developed Azure Data Factory pipelines to orchestrate data movement and transformation tasks, integrating with Azure Data Lake Analytics and Storage for high-performance ETL processing.
  • Created advanced user-defined functions (UDFs) in Pig and Hive to analyze customer behavior, supporting enhanced decision-making and personalized marketing strategies.
  • Built MapReduce jobs to support distributed processing of large datasets, integrating with Hive external tables and implementing partitioning and bucketing strategies for query optimization.
  • Utilized ETL tools including Talend and Informatica to standardize data migration, cleansing, and integration workflows, ensuring consistency and accuracy across complex data environments.
  • Tools & Environment: Hadoop, Spark, Hive, Pig, MapReduce, Azure Data Factory, Data Lake, SQL, Oozie, FTP integration, Talend, Pentaho, Informatica, Python, Scala, Java, Teradata.

Education

Master of Science - Computer Science

Campbellsville University
Louisville, KY
12.2023

Skills

  • Hadoop
  • Spark
  • pyspark
  • Hive
  • HDFS
  • Sqoop
  • Kafka
  • Flume
  • Pig
  • YARN
  • Informatica
  • Python
  • scala
  • Java
  • SQL
  • Shell scripting
  • Snowflake
  • MySQL
  • PostgreSQL
  • MongoDB
  • Azure SQL
  • Cassandra
  • Teradata
  • Oracle DB
  • AWS
  • Azure
  • Talend
  • Pentaho
  • Collibra
  • Apache NiFi
  • Tableau
  • Power BI
  • SSRS
  • Crystal Reports
  • Agile
  • Scrum
  • Waterfall
  • ETL processes
  • Data cleansing
  • Performance Tuning

Timeline

Data Engineer

Elevance
12.2023 - Current

Data Engineer

US Bank
11.2022 - 12.2023

Data Engineer

Allstate Solutions Private Limited
05.2020 - 06.2022

Master of Science - Computer Science

Campbellsville University
SAINATH MANDADI