Summary
Overview
Work History
Education
Skills
Timeline
Generic

Rithvik Reddy Pinninti

McLean,VA

Summary

• Over 5 years of professional IT experience as a Data Engineer, possessing strong technical expertise, business experience, and communication skills to drive high-impact business outcomes.
• Experience in Software Development Life Cycle (SDLC), including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
• Skilled in managing Data analytics, Data processing, Machine learning, Artificial intelligence, and data-driven projects.
• Experienced in developing scripts using Python for Extract, Load, and Transform (ETL) operations, with a working knowledge of AWS Redshift.
• Proficient in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
• Experience working with Snowflake Multi-cluster and virtual warehouses in Snowflake.
• Proficient in handling and ingesting terabytes of streaming data (Kafka, Spark Streaming, Storm), batch data, and automation.
• Skilled in predictive analytics and creating impactful dashboards and reports with Power BI and Tableau.
• Experienced in automating data engineering pipelines adhering to standards and best practices such as proper partitioning, file formats, and incremental loads by maintaining previous state etc.
• Experience fine-tuning Spark applications using concepts like broadcasting, increasing shuffle parallelism, and caching/persisting DataFrames to utilize cluster resources effectively.
• Skilled in data ingestion, extraction, and transformation using ETL processes with AWS Glue, Lambda, AWS EMR, and Databricks.
• Proficiency in designing scalable and efficient data architectures on Azure, leveraging services like Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure Synapse Analytics, and PowerBI.
• Experience in designing and developing production-ready data processing applications in Spark using Scala/Python.
• Experience in support activities including troubleshooting, performance monitoring, and resolving production incidents.
• Experienced in using agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.
• Ability to collaborate closely with teams to ensure high quality and timely delivery of builds and releases.

Overview

6
6
years of professional experience

Work History

Software Engineer - Data

Capital One
07.2024 - Current
  • Engineered and implemented IAM roles and policies for AWS services such as S3, Glue, Step Functions, and EventBridge using infrastructure as code (Bogie file), improving security and compliance across data migration project
  • Developed and optimized 10+ AWS Glue jobs and Lambda functions using Python, automating ETL processes and reducing manual intervention by 80%
  • Designed and built scalable data pipelines using AWS Redshift, S3, and Kinesis, processing terabytes of data daily, and improving data throughput by 40%
  • Integrated CloudWatch logs with Splunk using Python scripts, enabling real-time monitoring and reducing incident response times by 25%
  • Leveraged Python for data transformation and processing tasks within AWS Glue and EMR, enhancing data quality and consistency across various datasets
  • Developed Gherkin-based test scenarios using Python Behave to implement unit, component, and end-to-end test cases, ensuring comprehensive test coverage and enhancing test automation
  • Designed and implemented data quality rules to ensure accuracy, consistency, and completeness of data during large-scale migration, significantly reducing data discrepancies and enhancing data integrity throughout the migration process
  • Developed a scenario-based mock data generator in Python to support testing, enhancing data pipeline design and reliability for robust data processing workflows
  • Proficient in using Git for version control, managing codebase versions, and ensuring collaboration across multiple development teams
  • Automated code review and CI/CD processes using Git hooks and integration with platforms like GitHub Actions and Jenkins for enhanced code quality
  • Collaborated with cross-functional teams to ensure seamless data migration and transformation, resulting in a 15% reduction in project delivery time and a 20% improvement in data accessibility
  • Environment: SQL, Python, PySpark, AWS (Glue, Step Functions, EventBridge, Athena, Redshift, S3, Lambda, IAM Roles), Git, Jira

Data Engineer

Verizon
07.2023 - 06.2024
  • Involved in Analysis, Design, and Implementation/Translation of Business User requirements
  • Implemented ETL processes to transform and cleanse data as it moves between MySQL and NoSQL databases
  • Leveraged PySpark's capabilities for data manipulation, aggregation, and filtering to prepare data for further processing
  • Joined, manipulated, and drew actionable insights from large data sources using Python and SQL
  • Ingested large data streams from company REST APIs into EMR cluster through AWS Kinesis
  • Automated cluster provisioning and job execution using AWS EMR, enabling cost-effective and scalable data workflows
  • Integrated Amazon Kinesis and Apache Kafka for real-time event streaming and used Databricks for real-time analytics, reducing latency by 50%
  • Streamed data from AWS fully managed Kafka brokers using Spark Streaming and processed data using explode transformations
  • Created data models and schema designs for Snowflake data warehouse to support complex analytical queries and reporting
  • Automated scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production
  • Used Spark SQL and DataFrames API to load structured and semi-structured data from MySQL tables into Spark Clusters
  • Developed ETL workflows using AWS Glue to efficiently load large data sets into data warehouse
  • Developed and visualized BI reports in Tableau, managed user access and groups, scheduled report instances, and monitored Tableau Server to ensure high availability for end users
  • Leveraged SQL scripting for data modeling, enabling streamlined data querying and reporting capabilities, contributing to improved insights into customer data
  • Developed Airflow pipelines to efficiently load data from multiple sources into Redshift and monitored job schedules
  • Implemented AWS Batch for efficient batch processing of large-scale data workloads, optimizing resource allocation and job scheduling
  • Successfully migrated data from Teradata to AWS, improving data accessibility and cost efficiency
  • Used Kubernetes to orchestrate deployment, scaling, and management of Docker containers
  • Finalized data pipeline using DynamoDB as NoSQL storage option
  • Instantiated, created, and maintained CI/CD (Continuous Integration & Deployment) pipelines and applied automation to environments and applications
  • Actively participated in Scrum meetings, reporting progress, and maintaining good communication with each team member and managers
  • Environment: Apache AirFlow, Kafka, Spark, MapReduce, Hadoop, Python, Snowflake, Databricks, PySpark, Docker, Kubernetes, AWS, DynamoDB, CI/CD, Tableau, Redshift, Rest APIs, Teradata, Windows

Data Engineer

Southwest Airlines
12.2022 - 06.2023
  • .Enhanced platform reliability, efficiency, and scalability across environments by optimizing processes to minimize downtime and maintain performance
  • Responsible for building ETL pipelines (Extract, Transform, Load) from the data lake to different databases based on the requirements
  • Created SSIS Packages using different Control Flow Tasks like Data Flow Task, Execute SQL Task, Sequence Container, For Each Loop Container, Send Mail Task, and Analysis Service Processing Task
  • Converted SQL queries into Spark transformations using Spark RDDs, Python, PySpark, and Scala
  • Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI and optimized the DAX queries
  • Involved in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system
  • Involved in Data Architecture, Data Profiling, Data analysis, Data Mapping, and Data architecture artifacts design
  • Created data models and schema designs for Snowflake data warehouse to support complex analytical queries and reporting
  • Developed highly complex Python and Scala code, ensuring it is maintainable, easy to use, and satisfies application requirements, data processing, and analytics using inbuilt libraries in Databricks.
  • Integrated Splunk with various data sources and applications to centralize log management and streamline incident response processes
  • Automated scripts and workflows using Apache Airflow and shell scripting to ensure daily execution in production
  • Installed and configured Apache Airflow for S3 bucket and Snowflake data warehouse and created DAGs to run the Airflow
  • Built and managed data pipelines using AWS Glue and Databricks, ensuring efficient and reliable data processing and analysis workflows
  • Automated advanced SQL queries and ETL techniques using Apache Airflow to reduce streamline weekly administration tasks
  • Extracted data from sources like SQL Server Databases, SQL Server Analysis Services Cubes, Excel, and loaded it into the target MS SQL Server database
  • Participated in daily stand-up meetings to update the project status with the internal Dev team
  • Environment: Spark, Kafka, Python, Scala, Splunk, Airflow, PySpark, ETL, SSIS, AWS (Redshift, Glue), Jira, SQL, Snowflake, Databricks, Power BI, and Windows

Data Engineer

Panasonic
01.2020 - 07.2022
  • Worked with business users to gather, define business requirements, and analyze possible technical solutions
  • Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per use cases
  • Developed Spark scripts using Python and Scala shell commands as per requirement
  • Built Spark jobs using PySpark to perform ETL for data in Azure Blob Storage
  • Designed and developed Tableau visualizations, including preparing dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies
  • Performed ETL operations using Azure Databricks and successfully migrated on premises Oracle ETL Processes to Azure synapse Analytics
  • Designed and maintained scalable data workflows using Azure Data Factory, Event Hubs, and Synapse Analytics, managing over 100TB of data with high efficiency and scalability
  • Worked on SQL queries in dimensional data warehouses and relational data warehouses
  • Performed Data Analysis and Data Profiling using complex SQL queries on various systems
  • Developed NoSQL databases by using CRUD, Indexing, Replication and Sharing in Cosmos DB
  • Followed agile methodology for entire project
  • Actively participated and provided feedback in constructive and insightful manner during weekly iterative review meetings to track progress for each iterative cycle and figure out issues
  • Environment: Spark, Scala, Python, PySpark, Pig, HDFS, Data Marts, ETL, Tableau, Azure, Databricks, Map Reduce, XML, JSON, Hive, SQL, Agile and Windows

Data Engineer

Xoom Works Inc
07.2018 - 12.2019
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation
  • Developed various spark applications using Scala to perform various enrichments of user behavioral data (click stream data) merged with user profile data
  • Involved in developing production-ready Spark applications using Spark RDD APIs, DataFrames, Spark-SQL, and Spark-Streaming API's
  • Implementing and managing ETL solutions and automating operational processes
  • Created Visual Charts, Graphs, Maps, Area Maps, Dashboards and Storytelling using Tableau
  • Data gathering, data cleaning, and data wrangling performed using Python
  • Worked on Snowflake environment to remove redundancy and load real-time data from various data sources into HDFS using Kafka
  • Designing and creating SQL Server tables, views, stored procedures, and functions
  • Used Agile (SCRUM) methodologies for Software Development
  • Actively participating in code reviews, meetings, and solving any technical issues
  • Environment: Spark, Scala, Hadoop, Python, Pyspark, AWS, MapReduce, Pig, ETL, HDFS, Hive, HBase, SQL, Agile and Windows

Education

Master of Science - Computer And Information Sciences

Texas Tech University
Lubbock, TX
12-2023

Skills

  • Databases Snowflake, DynamoDB, SQL, Hive, MySQL, Oracle, RDBMS, AWS Redshift, Amazon RDS, Teradata
  • NoSQL Databases MongoDB, Hadoop HBase, and Apache Cassandra
  • Programming Languages Python, Scala, SQL, Java
  • Cloud Platforms AWS, Docker, Azure
  • Querying Languages SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL
  • Reporting & Visualization Tableau, Power BI, Quick Sight
  • Cluster Security Kerberos, Ranger, IAM, VPC
  • Scalable Data Tools Hadoop, Hive, Apache Spark, Pig, Map Reduce
  • CI/CD Tools Jenkins, GitHub
  • Operating Systems Windows, Linux, Unix, macOS

Timeline

Software Engineer - Data

Capital One
07.2024 - Current

Data Engineer

Verizon
07.2023 - 06.2024

Data Engineer

Southwest Airlines
12.2022 - 06.2023

Data Engineer

Panasonic
01.2020 - 07.2022

Data Engineer

Xoom Works Inc
07.2018 - 12.2019

Master of Science - Computer And Information Sciences

Texas Tech University
Rithvik Reddy Pinninti