Summary

Overview

Work History

Education

Skills

Timeline

Saiteja Rapelli

St Louis,MO

Summary

Results-oriented Data Engineer with 3+ years of experience optimizing data-intensive applications. Demonstrated expertise in Apache Spark and Hadoop, consistently delivering scalable solutions and enhancing data processing efficiency. Proficient in ETL development using DataStage, experienced in data warehousing with Redshift and Snowflake, and skilled in Python for data cleansing. Extensive knowledge of SQL and NoSQL databases, UNIX Shell scripting, and workflow automation, with a strong focus on performance optimization and innovative problem-solving.

Overview

years of professional experience

Work History

Data Specialist

C4 Technical Services

07.2024 - Current

Designed and deployed an Enterprise Data Lake to support complex analytics and large-scale data processing, handling dynamic datasets. Ensured seamless integration for diverse use cases, including real-time analytics and historical data reporting.
Implemented database connectivity using JDBC with the MYSQL database as the backend, ensuring seamless data integration.
Collaborated on the design of intelligent routine to extract terabytes of data across all Hospital sites, optimizing data extraction processes.
Engineered ETL solutions using Spark SQL in Databricks for data extraction, transformation, and aggregation, processing data from over 20 different sources in various file formats.
Created comprehensive Star and Snowflake data models for an Enterprise Data Warehouse using ERWIN, leading to a 32% improvement in query performance and faster reporting capabilities.
Leveraged Athena to execute queries on Glue ETL-processed data, producing over 100 interactive reports and dashboards with QuickSight. Enabled data-driven decision-making across the organization.
Conducted GAP analysis between As-Is & To-Be processes, performed risk analysis of the existing system, and evaluated benefits of the new system, ensuring alignment with business objectives.
Created Excel and Power BI reporting solutions that sourced data from SQL database queries, facilitating data-driven decision-making processes.

Data Science Developer

Caterpillar Inc

01.2022 - 11.2022

Developed and implemented machine learning models to analyze large datasets, enhancing predictive accuracy and driving data-driven decision-making.
Utilized Python libraries such as TensorFlow and PyTorch to build and deploy deep learning models for complex data analysis tasks, improving model performance by 25%.
Implemented Jenkins for continuous integration, significantly reducing manual intervention in the development lifecycle.
Executed the migration of on-premise infrastructure to AWS, resulting in improved scalability and cost optimization.
Utilized AWS services such as EC2, S3, and Lambda for efficient and scalable cloud infrastructure.
Implemented continuous improvement strategies, conducting regular retrospectives and establishing feedback loops.
Contributed to the development and execution of disaster recovery plans and ensured data backup and restoration processes.
Fostered a culture of collaboration between development, operations, and QA teams, improving communication and accelerating issue resolution.
Conducted knowledge-sharing sessions to disseminate best practices in DevOps and promote cross-functional skills development.
Collaborated with mobile app developers to integrate automated testing into the CI/CD pipeline, resulting in improved code quality and faster release cycles.

Data Engineer

NXP Net Solutions

07.2020 - 12.2021

Conducted exploratory data analysis (EDA) to identify trends and patterns, utilizing visualization tools like Matplotlib and Seaborn to communicate findings effectively to stakeholders.
Implemented automated model training and evaluation frameworks, improving the efficiency of the machine learning lifecycle and reducing time-to-deployment by 30%..
Streamlined DataStage component management, improving job execution times by 24% through rigorous testing and debugging, which also increased data processing accuracy via advanced SQL and PL/SQL techniques.
Built scalable data pipelines using Azure Data Factory and Apache Spark, improving processing efficiency by 35%.
Optimized Azure SQL Database and Cosmos DB, enhancing data retrieval speed by 30% for high-volume transactions.
Developed ETL processes that integrated on-premises systems with Azure, reducing data processing time by 25%.
Implemented validation checks in Azure Data Lake, reducing data discrepancies by 40% and ensuring regulatory compliance.
Partnered with data scientists and analysts to deliver data solutions that supported key business initiatives, resulting in a 20% increase in actionable insights.
Utilized Azure Monitor to optimize workflows, cutting operational costs by 15%.

Associate Data Engineer

Settle Metal

01.2019 - 06.2020

Developed and managed ETL solutions, creating automated operational processes that reduced manual intervention by 45% and crafting reusable mapplets and Oracle PL/SQL stored procedures to streamline workflows.
Scheduled, tested, and debugged DataStage components using its run-time engine, improving job execution time by 24% and enhancing data processing accuracy through advanced SQL and PL/SQL techniques.
Monitored and managed DataStage jobs through daily UNIX shell scripts, resolving job failures and initiating force starts, resulting in a 12% reduction in job downtime.
Implemented robust monitoring and alerting systems for Kafka clusters, achieving 99.9% system uptime and enabling real-time health and performance tracking. Automated backup and disaster recovery strategies reduced data recovery time by 20%.
Optimized Databricks Spark jobs with PySpark, resulting in a 32% increase in data processing throughput for table-to-table operations.
Integrated cloud services for enhanced data management and workflow automation, enabling seamless access to cloud resources.
Collaborated with cross-functional teams to ensure data quality and compliance with industry standards.

Education

Masters - Information Technology Management

Webster University

St. Louis, MO

01-2024

Bachelor of Technology -

Tkr College of Engineering And Technology

India

2021

Skills

Programming Languages: Scala, Python, SQL, PL/SQL, UNIX Shell Scripting
Big Data Technologies: Apache Spark, PySpark, Hadoop, HBase, Apache Kafka, Cassandra, AWS EMR
Data Warehousing: Amazon Redshift, Snowflake, AWS S3, AWS Athena, AWS RDS, AWS Glue, Azure Data Lake, Azure Data Factory
Data Analysis and Reporting: SSRS, Tableau
ETL Tools: DataStage, Talend, Apache Airflow, Data Factory
Data Integration: Oracle PL/SQL, ETL Automation, Sqoop, MapReduce

Data Processing and Querying: Hive, Apache Pig, Delta Lake
Database Management: SQL, PL/SQL, Database Design, Stored Procedures, Functions, Packages, Triggers
Data Monitoring and Alerting: Kafka Monitoring, Performance Tracking
Backup and Disaster Recovery: Automated Backup Strategies, Failover Capabilities
Web Services: REST, SOAP

Timeline

Data Specialist

C4 Technical Services

07.2024 - Current

Data Science Developer

Caterpillar Inc

01.2022 - 11.2022

Data Engineer

NXP Net Solutions

07.2020 - 12.2021

Associate Data Engineer

Settle Metal

01.2019 - 06.2020

Masters - Information Technology Management

Webster University

Bachelor of Technology -

Tkr College of Engineering And Technology

Saiteja Rapelli

Summary

Overview

Work History

Data Specialist

Data Science Developer

Data Engineer

Associate Data Engineer

Education

Masters - Information Technology Management

Bachelor of Technology -

Skills

Timeline

Data Specialist

Data Science Developer

Data Engineer

Associate Data Engineer

Masters - Information Technology Management

Bachelor of Technology -

Similar Profiles

Stephanie BrownStephanie Brown

Katie Jo MuellerKatie Jo Mueller

Stephanie HouseStephanie House

Tauheed Khan BurkiTauheed Khan Burki

AUSAR JOSEPHAUSAR JOSEPH