Saiteja Ravipati

Frisco,TX

Summary

Experienced Data Engineer with over 4 years of expertise in designing and implementing scalable data pipelines and cloud-based data solutions. Demonstrated success across leading organizations including Caterpillar, Autodesk, and APSPDCL. Specialized in Python, AWS, Azure, and modern tools like Airflow, Snowflake, and Databricks. Strong in orchestration, automation, data modeling, and cross-functional collaboration to deliver high-quality, enterprise-grade data platforms.

Overview

years of professional experience

Work History

Python/AWS Data Engineer

CaterpillarInc

Irving, TX

05.2024 - Current

At Caterpillar, I worked as part of an enterprise-level data engineering team responsible for building and maintaining dealer and customer data pipelines to support analytics and operational systems. My role involved:

Designing and implementing end-to-end ETL workflows using AWS services such as S3, Lambda, Glue, and Step Functions to manage structured and semi-structured data.
Orchestrating and monitoring data pipelines using PLM (internal Airflow-like tool), with features like retries, notifications, and DLQ mechanisms to ensure reliability and traceability.
Developing modular Python-based ETL frameworks to standardize the ingestion, transformation, and loading of customer and system data into Snowflake.
Enabling near real-time communication between services through AWS SQS/SNS integration, improving data availability for downstream applications.
Collaborating with cross-functional teams including analytics, infrastructure, and DevOps, to deliver scalable, reusable, and secure data solutions.
Automating log retrieval and diagnostics using GitHub Copilot and scripting, improving operational efficiency and reducing manual effort.
Implementing observability and alerts using PLM and CloudWatch to track performance, failures, and key pipeline metrics.

Data Engineer

Autodesk

San Rafael, CA

02.2023 - 03.2024

At Autodesk, I contributed to the development of scalable, cloud-native data engineering solutions to support enterprise analytics and machine learning initiatives. My responsibilities included:

Designing and building robust ETL pipelines using AWS Glue, Lambda, and Redshift to handle large-scale structured data ingestion and transformation.
Automating data workflows using Python and PySpark to improve pipeline efficiency, reduce manual intervention, and ensure consistent delivery of clean, validated data.
Creating and managing data lakes and dimensional data models to support advanced analytics and machine learning applications across business units.
Orchestrating end-to-end data workflows using Apache Airflow, ensuring timely and reliable execution of complex interdependent jobs.
Optimizing SQL queries and Redshift schema design to improve performance in data retrieval and reporting processes.
Implementing CI/CD pipelines using GitHub and Jenkins to enable version-controlled deployments, seamless integration, and release automation.
Collaborating with analytics teams and product stakeholders to define data requirements, support ad hoc analysis, and streamline reporting needs.

Software Engineer

APSPDCL

AndhraPradesh, India

08.2019 - 06.2021

At APSPDCL, I was part of the data engineering team responsible for managing and modernizing the utility's data processing and reporting infrastructure. My role focused on delivering reliable and scalable data solutions to support operational, billing, and analytics systems. Key responsibilities included:

Developed complex validation scripts using SAP SQLScript and ABAP to ensure accurate and consistent energy consumption and billing data across SAP systems.
Built and maintained real-time data pipelines using SAP HANA Streaming Analytics to process and analyze metering and distribution data from IoT and grid systems.
Implemented ETL pipelines using SAP HANA Smart Data Integration (SDI) to integrate data from various operational sources into centralized HANA data models.
Designed and delivered interactive dashboards using SAP Analytics Cloud (SAC), enabling operational teams to monitor grid performance, outages, and energy consumption.
Collaborated with SAP and data analytics teams to optimize query performance and streamline data flows across legacy and modern systems.
Supported the digitization initiative to improve reporting accuracy, operational visibility, and decision-making for power distribution planning.

Education

Master of Science - Information Technology

University of Memphis

Memphis, TN

12-2022

Skills

Languages: Python, Java, SQL, ABAP, SAP SQLScript
Cloud Platforms: AWS (S3, Lambda, Glue, EC2, RDS, Redshift), Azure (ADF, Functions, Blob Storage)
ETL & Orchestration: Apache Airflow, AWS Glue, Step Functions, SAP HANA SDI
Data Warehousing: Snowflake, Redshift, Azure Synapse
Big Data & Streaming: Databricks, PySpark, Apache Spark, Kinesis, HDFS, Spark Streaming
Databases: PostgreSQL, Oracle, Snowflake, SAP HANA, RDS
DevOps & CI/CD: GitHub, Azure DevOps, Jenkins, GitHub Copilot
Visualization Tools: Power BI, Tableau, SAP Analytics Cloud
Other Tools: JIRA, Terraform, REST APIs, JSON, XML, Linux

Timeline