Summary

Overview

Work History

Education

Skills

Timeline

Rithvik Reddy Pinninti

McLean,VA

Summary

• Over 5 years of professional IT experience as a Data Engineer, possessing strong technical expertise, business experience, and communication skills to drive high-impact business outcomes.
• Experience in Software Development Life Cycle (SDLC), including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
• Skilled in managing Data analytics, Data processing, Machine learning, Artificial intelligence, and data-driven projects.
• Experienced in developing scripts using Python for Extract, Load, and Transform (ETL) operations, with a working knowledge of AWS Redshift.
• Proficient in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
• Experience working with Snowflake Multi-cluster and virtual warehouses in Snowflake.
• Proficient in handling and ingesting terabytes of streaming data (Kafka, Spark Streaming, Storm), batch data, and automation.
• Skilled in predictive analytics and creating impactful dashboards and reports with Power BI and Tableau.
• Experienced in automating data engineering pipelines adhering to standards and best practices such as proper partitioning, file formats, and incremental loads by maintaining previous state etc.
• Experience fine-tuning Spark applications using concepts like broadcasting, increasing shuffle parallelism, and caching/persisting DataFrames to utilize cluster resources effectively.
• Skilled in data ingestion, extraction, and transformation using ETL processes with AWS Glue, Lambda, AWS EMR, and Databricks.
• Proficiency in designing scalable and efficient data architectures on Azure, leveraging services like Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure Synapse Analytics, and PowerBI.
• Experience in designing and developing production-ready data processing applications in Spark using Scala/Python.
• Experience in support activities including troubleshooting, performance monitoring, and resolving production incidents.
• Experienced in using agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.
• Ability to collaborate closely with teams to ensure high quality and timely delivery of builds and releases.

Overview

years of professional experience

Work History

Software Engineer - Data

Capital One

07.2024 - Current

Engineered and implemented IAM roles and policies for AWS services such as S3, Glue, Step Functions, and EventBridge using infrastructure as code (Bogie file), improving security and compliance across data migration project
Developed and optimized 10+ AWS Glue jobs and Lambda functions using Python, automating ETL processes and reducing manual intervention by 80%
Designed and built scalable data pipelines using AWS Redshift, S3, and Kinesis, processing terabytes of data daily, and improving data throughput by 40%
Integrated CloudWatch logs with Splunk using Python scripts, enabling real-time monitoring and reducing incident response times by 25%
Leveraged Python for data transformation and processing tasks within AWS Glue and EMR, enhancing data quality and consistency across various datasets
Developed Gherkin-based test scenarios using Python Behave to implement unit, component, and end-to-end test cases, ensuring comprehensive test coverage and enhancing test automation
Designed and implemented data quality rules to ensure accuracy, consistency, and completeness of data during large-scale migration, significantly reducing data discrepancies and enhancing data integrity throughout the migration process
Developed a scenario-based mock data generator in Python to support testing, enhancing data pipeline design and reliability for robust data processing workflows
Proficient in using Git for version control, managing codebase versions, and ensuring collaboration across multiple development teams
Automated code review and CI/CD processes using Git hooks and integration with platforms like GitHub Actions and Jenkins for enhanced code quality
Collaborated with cross-functional teams to ensure seamless data migration and transformation, resulting in a 15% reduction in project delivery time and a 20% improvement in data accessibility
Environment: SQL, Python, PySpark, AWS (Glue, Step Functions, EventBridge, Athena, Redshift, S3, Lambda, IAM Roles), Git, Jira

Data Engineer

Verizon

07.2023 - 06.2024

Involved in Analysis, Design, and Implementation/Translation of Business User requirements
Implemented ETL processes to transform and cleanse data as it moves between MySQL and NoSQL databases
Leveraged PySpark's capabilities for data manipulation, aggregation, and filtering to prepare data for further processing
Joined, manipulated, and drew actionable insights from large data sources using Python and SQL
Ingested large data streams from company REST APIs into EMR cluster through AWS Kinesis
Automated cluster provisioning and job execution using AWS EMR, enabling cost-effective and scalable data workflows
Integrated Amazon Kinesis and Apache Kafka for real-time event streaming and used Databricks for real-time analytics, reducing latency by 50%
Streamed data from AWS fully managed Kafka brokers using Spark Streaming and processed data using explode transformations
Created data models and schema designs for Snowflake data warehouse to support complex analytical queries and reporting
Automated scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production
Used Spark SQL and DataFrames API to load structured and semi-structured data from MySQL tables into Spark Clusters
Developed ETL workflows using AWS Glue to efficiently load large data sets into data warehouse
Developed and visualized BI reports in Tableau, managed user access and groups, scheduled report instances, and monitored Tableau Server to ensure high availability for end users
Leveraged SQL scripting for data modeling, enabling streamlined data querying and reporting capabilities, contributing to improved insights into customer data
Developed Airflow pipelines to efficiently load data from multiple sources into Redshift and monitored job schedules
Implemented AWS Batch for efficient batch processing of large-scale data workloads, optimizing resource allocation and job scheduling
Successfully migrated data from Teradata to AWS, improving data accessibility and cost efficiency
Used Kubernetes to orchestrate deployment, scaling, and management of Docker containers
Finalized data pipeline using DynamoDB as NoSQL storage option
Instantiated, created, and maintained CI/CD (Continuous Integration & Deployment) pipelines and applied automation to environments and applications
Actively participated in Scrum meetings, reporting progress, and maintaining good communication with each team member and managers
Environment: Apache AirFlow, Kafka, Spark, MapReduce, Hadoop, Python, Snowflake, Databricks, PySpark, Docker, Kubernetes, AWS, DynamoDB, CI/CD, Tableau, Redshift, Rest APIs, Teradata, Windows

Data Engineer

Southwest Airlines

12.2022 - 06.2023

.Enhanced platform reliability, efficiency, and scalability across environments by optimizing processes to minimize downtime and maintain performance
Responsible for building ETL pipelines (Extract, Transform, Load) from the data lake to different databases based on the requirements
Created SSIS Packages using different Control Flow Tasks like Data Flow Task, Execute SQL Task, Sequence Container, For Each Loop Container, Send Mail Task, and Analysis Service Processing Task
Converted SQL queries into Spark transformations using Spark RDDs, Python, PySpark, and Scala
Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI and optimized the DAX queries
Involved in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system
Involved in Data Architecture, Data Profiling, Data analysis, Data Mapping, and Data architecture artifacts design
Created data models and schema designs for Snowflake data warehouse to support complex analytical queries and reporting
Developed highly complex Python and Scala code, ensuring it is maintainable, easy to use, and satisfies application requirements, data processing, and analytics using inbuilt libraries in Databricks.
Integrated Splunk with various data sources and applications to centralize log management and streamline incident response processes
Automated scripts and workflows using Apache Airflow and shell scripting to ensure daily execution in production
Installed and configured Apache Airflow for S3 bucket and Snowflake data warehouse and created DAGs to run the Airflow
Built and managed data pipelines using AWS Glue and Databricks, ensuring efficient and reliable data processing and analysis workflows
Automated advanced SQL queries and ETL techniques using Apache Airflow to reduce streamline weekly administration tasks
Extracted data from sources like SQL Server Databases, SQL Server Analysis Services Cubes, Excel, and loaded it into the target MS SQL Server database
Participated in daily stand-up meetings to update the project status with the internal Dev team
Environment: Spark, Kafka, Python, Scala, Splunk, Airflow, PySpark, ETL, SSIS, AWS (Redshift, Glue), Jira, SQL, Snowflake, Databricks, Power BI, and Windows

Data Engineer

Panasonic

01.2020 - 07.2022

Worked with business users to gather, define business requirements, and analyze possible technical solutions
Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per use cases
Developed Spark scripts using Python and Scala shell commands as per requirement
Built Spark jobs using PySpark to perform ETL for data in Azure Blob Storage
Designed and developed Tableau visualizations, including preparing dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies
Performed ETL operations using Azure Databricks and successfully migrated on premises Oracle ETL Processes to Azure synapse Analytics
Designed and maintained scalable data workflows using Azure Data Factory, Event Hubs, and Synapse Analytics, managing over 100TB of data with high efficiency and scalability
Worked on SQL queries in dimensional data warehouses and relational data warehouses
Performed Data Analysis and Data Profiling using complex SQL queries on various systems
Developed NoSQL databases by using CRUD, Indexing, Replication and Sharing in Cosmos DB
Followed agile methodology for entire project
Actively participated and provided feedback in constructive and insightful manner during weekly iterative review meetings to track progress for each iterative cycle and figure out issues
Environment: Spark, Scala, Python, PySpark, Pig, HDFS, Data Marts, ETL, Tableau, Azure, Databricks, Map Reduce, XML, JSON, Hive, SQL, Agile and Windows

Data Engineer

Xoom Works Inc

07.2018 - 12.2019

Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation
Developed various spark applications using Scala to perform various enrichments of user behavioral data (click stream data) merged with user profile data
Involved in developing production-ready Spark applications using Spark RDD APIs, DataFrames, Spark-SQL, and Spark-Streaming API's
Implementing and managing ETL solutions and automating operational processes
Created Visual Charts, Graphs, Maps, Area Maps, Dashboards and Storytelling using Tableau
Data gathering, data cleaning, and data wrangling performed using Python
Worked on Snowflake environment to remove redundancy and load real-time data from various data sources into HDFS using Kafka
Designing and creating SQL Server tables, views, stored procedures, and functions
Used Agile (SCRUM) methodologies for Software Development
Actively participating in code reviews, meetings, and solving any technical issues
Environment: Spark, Scala, Hadoop, Python, Pyspark, AWS, MapReduce, Pig, ETL, HDFS, Hive, HBase, SQL, Agile and Windows

Education

Master of Science - Computer And Information Sciences

Texas Tech University

Lubbock, TX

12-2023

Skills

Databases Snowflake, DynamoDB, SQL, Hive, MySQL, Oracle, RDBMS, AWS Redshift, Amazon RDS, Teradata
NoSQL Databases MongoDB, Hadoop HBase, and Apache Cassandra
Programming Languages Python, Scala, SQL, Java
Cloud Platforms AWS, Docker, Azure
Querying Languages SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL
Reporting & Visualization Tableau, Power BI, Quick Sight
Cluster Security Kerberos, Ranger, IAM, VPC
Scalable Data Tools Hadoop, Hive, Apache Spark, Pig, Map Reduce
CI/CD Tools Jenkins, GitHub
Operating Systems Windows, Linux, Unix, macOS

Timeline

Software Engineer - Data

Capital One

07.2024 - Current

Data Engineer

Verizon

07.2023 - 06.2024

Data Engineer

Southwest Airlines

12.2022 - 06.2023

Data Engineer

Panasonic

01.2020 - 07.2022

Data Engineer

Xoom Works Inc

07.2018 - 12.2019

Master of Science - Computer And Information Sciences

Texas Tech University

Rithvik Reddy Pinninti

Summary

Overview

Work History

Software Engineer - Data

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Master of Science - Computer And Information Sciences

Skills

Timeline

Software Engineer - Data

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Master of Science - Computer And Information Sciences

Similar Profiles

EDDIE FEATHEREDDIE FEATHER

Devin HallDevin Hall

Sumahitha VadnalaSumahitha Vadnala

Connor McLauryConnor McLaury

Ashley O'ConnorAshley O'Connor