Summary
Overview
Work History
Education
Skills
Timeline
Generic

Saiteja Rapelli

St Louis,MO

Summary

Results-oriented Data Engineer with 3+ years of experience optimizing data-intensive applications. Demonstrated expertise in Apache Spark and Hadoop, consistently delivering scalable solutions and enhancing data processing efficiency. Proficient in ETL development using DataStage, experienced in data warehousing with Redshift and Snowflake, and skilled in Python for data cleansing. Extensive knowledge of SQL and NoSQL databases, UNIX Shell scripting, and workflow automation, with a strong focus on performance optimization and innovative problem-solving.

Overview

6
6
years of professional experience

Work History

Data Specialist

C4 Technical Services
07.2024 - Current
  • Designed and deployed an Enterprise Data Lake to support complex analytics and large-scale data processing, handling dynamic datasets. Ensured seamless integration for diverse use cases, including real-time analytics and historical data reporting.
  • Implemented database connectivity using JDBC with the MYSQL database as the backend, ensuring seamless data integration.
  • Collaborated on the design of intelligent routine to extract terabytes of data across all Hospital sites, optimizing data extraction processes.
  • Engineered ETL solutions using Spark SQL in Databricks for data extraction, transformation, and aggregation, processing data from over 20 different sources in various file formats.
  • Created comprehensive Star and Snowflake data models for an Enterprise Data Warehouse using ERWIN, leading to a 32% improvement in query performance and faster reporting capabilities.
  • Leveraged Athena to execute queries on Glue ETL-processed data, producing over 100 interactive reports and dashboards with QuickSight. Enabled data-driven decision-making across the organization.
  • Conducted GAP analysis between As-Is & To-Be processes, performed risk analysis of the existing system, and evaluated benefits of the new system, ensuring alignment with business objectives.
  • Created Excel and Power BI reporting solutions that sourced data from SQL database queries, facilitating data-driven decision-making processes.

Data Science Developer

Caterpillar Inc
01.2022 - 11.2022


  • Developed and implemented machine learning models to analyze large datasets, enhancing predictive accuracy and driving data-driven decision-making.
  • Utilized Python libraries such as TensorFlow and PyTorch to build and deploy deep learning models for complex data analysis tasks, improving model performance by 25%.
  • Implemented Jenkins for continuous integration, significantly reducing manual intervention in the development lifecycle.
  • Executed the migration of on-premise infrastructure to AWS, resulting in improved scalability and cost optimization.
  • Utilized AWS services such as EC2, S3, and Lambda for efficient and scalable cloud infrastructure.
  • Implemented continuous improvement strategies, conducting regular retrospectives and establishing feedback loops.
  • Contributed to the development and execution of disaster recovery plans and ensured data backup and restoration processes.
  • Fostered a culture of collaboration between development, operations, and QA teams, improving communication and accelerating issue resolution.
  • Conducted knowledge-sharing sessions to disseminate best practices in DevOps and promote cross-functional skills development.
  • Collaborated with mobile app developers to integrate automated testing into the CI/CD pipeline, resulting in improved code quality and faster release cycles.

Data Engineer

NXP Net Solutions
07.2020 - 12.2021
  • Conducted exploratory data analysis (EDA) to identify trends and patterns, utilizing visualization tools like Matplotlib and Seaborn to communicate findings effectively to stakeholders.
  • Implemented automated model training and evaluation frameworks, improving the efficiency of the machine learning lifecycle and reducing time-to-deployment by 30%..
  • Streamlined DataStage component management, improving job execution times by 24% through rigorous testing and debugging, which also increased data processing accuracy via advanced SQL and PL/SQL techniques.
  • Built scalable data pipelines using Azure Data Factory and Apache Spark, improving processing efficiency by 35%.
  • Optimized Azure SQL Database and Cosmos DB, enhancing data retrieval speed by 30% for high-volume transactions.
  • Developed ETL processes that integrated on-premises systems with Azure, reducing data processing time by 25%.
  • Implemented validation checks in Azure Data Lake, reducing data discrepancies by 40% and ensuring regulatory compliance.
  • Partnered with data scientists and analysts to deliver data solutions that supported key business initiatives, resulting in a 20% increase in actionable insights.
  • Utilized Azure Monitor to optimize workflows, cutting operational costs by 15%.

Associate Data Engineer

Settle Metal
01.2019 - 06.2020
  • Developed and managed ETL solutions, creating automated operational processes that reduced manual intervention by 45% and crafting reusable mapplets and Oracle PL/SQL stored procedures to streamline workflows.
  • Scheduled, tested, and debugged DataStage components using its run-time engine, improving job execution time by 24% and enhancing data processing accuracy through advanced SQL and PL/SQL techniques.
  • Monitored and managed DataStage jobs through daily UNIX shell scripts, resolving job failures and initiating force starts, resulting in a 12% reduction in job downtime.
  • Implemented robust monitoring and alerting systems for Kafka clusters, achieving 99.9% system uptime and enabling real-time health and performance tracking. Automated backup and disaster recovery strategies reduced data recovery time by 20%.
  • Optimized Databricks Spark jobs with PySpark, resulting in a 32% increase in data processing throughput for table-to-table operations.
  • Integrated cloud services for enhanced data management and workflow automation, enabling seamless access to cloud resources.
  • Collaborated with cross-functional teams to ensure data quality and compliance with industry standards.

Education

Masters - Information Technology Management

Webster University
St. Louis, MO
01-2024

Bachelor of Technology -

Tkr College of Engineering And Technology
India
2021

Skills

  • Programming Languages: Scala, Python, SQL, PL/SQL, UNIX Shell Scripting
  • Big Data Technologies: Apache Spark, PySpark, Hadoop, HBase, Apache Kafka, Cassandra, AWS EMR
  • Data Warehousing: Amazon Redshift, Snowflake, AWS S3, AWS Athena, AWS RDS, AWS Glue, Azure Data Lake, Azure Data Factory
  • Data Analysis and Reporting: SSRS, Tableau
  • ETL Tools: DataStage, Talend, Apache Airflow, Data Factory
  • Data Integration: Oracle PL/SQL, ETL Automation, Sqoop, MapReduce
  • Data Processing and Querying: Hive, Apache Pig, Delta Lake
  • Database Management: SQL, PL/SQL, Database Design, Stored Procedures, Functions, Packages, Triggers
  • Data Monitoring and Alerting: Kafka Monitoring, Performance Tracking
  • Backup and Disaster Recovery: Automated Backup Strategies, Failover Capabilities
  • Web Services: REST, SOAP

Timeline

Data Specialist

C4 Technical Services
07.2024 - Current

Data Science Developer

Caterpillar Inc
01.2022 - 11.2022

Data Engineer

NXP Net Solutions
07.2020 - 12.2021

Associate Data Engineer

Settle Metal
01.2019 - 06.2020

Masters - Information Technology Management

Webster University

Bachelor of Technology -

Tkr College of Engineering And Technology
Saiteja Rapelli