Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Ravitheja Papareddy

Frisco,TX

Summary

Currently working as a Senior Data Engineer, I lead the design and development of end-to-end data pipelines on Databricks using PySpark, integrating a range of AWS services such as S3, Lambda, and Redshift, while orchestrating complex workflows with Airflow to ensure efficient, scalable data processing. I have successfully led cross-functional teams to implement advanced data solutions on Snowflake and AWS, significantly enhancing data accessibility and performance through optimized ELT workflows and robust data modeling practices. With extensive experience as a Data Engineer and Data Analyst, I specialize in building data pipelines using the Hadoop ecosystem, Spark, Hive, HDFS, MapReduce, YARN, Sqoop, Kafka, Oozie, and Teradata, while also leveraging cloud platforms like AWS and Google Cloud. I bring deep technical expertise in developing Spark applications using PySpark and Spark-SQL, creating Hive tables with custom UDFs, and utilizing visualization tools like Tableau and Amazon QuickSight. I’ve worked with cluster monitoring tools such as Cloudera Manager and Hortonworks and have hands-on experience with real-time data streaming via Kafka. My skill set includes advanced data manipulation using Partitions, Joins, and Window Functions, along with designing, testing, and maintaining data management systems using Spark, Hadoop, AWS, and Shell scripting. I am proficient in Python, Core Java, SQL, and Object-Oriented Design, with a strong background in creating stored procedures, triggers, and views for reliable data operations. Additionally, I work closely with business users, product owners, and engineering teams in Agile environments to deliver data-driven features, translating technical outcomes into actionable business insights and aligning data strategies with organizational goals.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Data Engineer

Walt Disney - Wipro Technologies - IDC
01.2024 - Current
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
  • Provided technical guidance and mentorship to junior team members, fostering a collaborative learning environment within the organization.
  • Led end-to-end implementation of multiple high-impact projects from requirements gathering through deployment and post-launch support stages.
  • Evaluated various tools, technologies, and best practices for potential adoption in the company''s data engineering processes.
  • Collaborated with cross-functional teams for seamless integration of data sources into the company''s data ecosystem.
  • Gathered, defined and refined requirements, led project design and oversaw implementation.

DATA ENGINEER

Apple - Infosys - AML SHURI-JE
03.2023 - 01.2024
  • Created Kerberos Dataproc clusters for data migration and established cross-realm setup to migrate the data from HDFS to GCP
  • Migrated the existing spark jobs from CDH to GCP
  • Proficient in using Py-Spark and Spark SQL
  • Migrated the historical data and pointed the Kafka to load data into GCS buckets
  • Validated the customer data upon successful migration
  • Registering the tables in HMS in Iceberg format
  • Worked on tuning the code/Queries to reduce load and resolved small file issue
  • Contributed to Apple wiki's
  • Built dashboards using Tableau
  • Interacting with multiple business owners across AML on GCP migration
  • The AML Solution team build solutions that impact across Apple
  • Typically, these problems are related to Machine Learning platform, Machine Learning Solution and Big Data
  • Currently, we partner with Retail, AppleCare, Marcom, and Manufacturing to build solutions that include Search Engine
  • Implemented cutting-edge machine learning algorithms to unlock valuable insights from large volumes of structured and unstructured data.

Data Engineer

Apple - Infosys - AMP CORE
03.2021 - 03.2023
  • Good Hands-on experience in data extraction, exploration, and analysis to produce reports and visualizations
  • Proficient in using Spark Scala, Py-Spark and Spark SQL to read data from large datasets
  • Data transformations performed using the Spark data-frame methods and joining different tables to generate single level order ID for every transaction
  • Worked on tuning the code/Queries to reduce load of the data engine and to improve the execution time by implementing joins, selecting the preferred columns and other SQL and spark logics
  • Built dashboards using Tableau
  • Monitoring the Hydra dashboard to check the App Store health
  • Monitoring the QGT dashboard by creating rules for threshold for app crashes
  • Performing Data refresh activities for both Prod and UAT environments
  • Worked on Schema changes in Kafka pipelines and Oracle pipelines
  • Interacting with multiple business projects to gather the use cases and helping teams by providing new business models
  • The AMP Data investigation team analyzes and produce insights from diagnostic and usage data from hundreds of millions of devices every day from all over the world
  • The insights are used to improve Apple's products and services, to inform strategic directions, and to improve user experience
  • Our team works at a high-pace and high-functioning team of Data Analysts that use the latest Big Data technologies to tackle complex, large-scale problems using immense quantities of collected data

Data Engineer Intern

LendingWise - Northwest Missouri State University
01.2019 - 04.2019
  • Load the raw data from Kafka into Spark by applying business logics into big query
  • Created trends based on the customer data using Tableau
  • Performed performance tuning and troubleshooting of jobs by analyzing and reviewing Hadoop log files
  • Experience with Airflow Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce jobs
  • Worked on issues reported by the QA team actively and resolved all the issues with data processing
  • Documented the systems processes and procedures for future references
  • LendingWise.com is a robust, cloud-based CRM and LOS platform designed for Hard & Private Money Mortgage Lenders & brokers of all sizes
  • Private money lenders typically piggyback off the traditional mortgage market tools and software, which is bloated with unnecessary features that's low them down and burn their pockets with extra fees
  • Now they have a CRM & LOS fully integrated under one platform, so they can manage the sales & marketing, broker management, deal flow, processing, underwriting and closing of hard money loans

Data Analyst

LettuceDream - Northwest Missouri State University
08.2018 - 12.2018
  • Developed and designed the IOS QR code generator application for the tracking of Lettuce life cycle management using pre-existing libraries
  • Migrated the data from RDBMS files into the landing zones (Google Buckets)
  • Data transformations performed using the Spark data-frame methods and joining different tables to generate single level order ID for every transaction
  • Applied business logics on the data and generated reports on daily basis, quarterly and annually using Tableau from different clients like Northwest Missouri State University, Hyvee and Walmart
  • Experienced in working with customer transaction level data and building the consumption tables by joining multiple tables like customer visit, items scanned, stores dimension table, item dimension table etc., to get an overview on the customer transaction
  • Created an order Management System which manages all the orders from different clients and calculates the waste using the waste management
  • Work closely with the business teams to meet the business requirements and SLA's daily
  • Scheduled jobs using Airflow by creating DAG's and run them on daily basis to update the tables
  • Lettuce Dream is a Non-Profit Organization, which produces organic lettuce
  • Organization holds massive data of harvesting cycles and its customers
  • Data Migration from MySQL server to Google cloud Buckets

Student Manager

Northwest Missouri State University
09.2017 - 07.2018
  • Kronos scheduling tool that helps to track in-time and out-time of the workforce
  • Involved in Requirement gathering and analyzing the existing manual process of scheduling the hours
  • UI Customization and creation of user-defined fields forms as per the requirement
  • Certified food handler
  • Excellent customer service skills and ability to deal with a variety of restaurant patrons
  • Awarded 'Front Line First' honor for keeping up the good work
  • Conducted staff training regarding fine dining
  • Ensure proper food safety handling is enforced
  • Aramark Corporation, known commonly as Aramark, is an American food service, facilities, and uniform services provider to clients in fields including education, healthcare, business, corrections, and leisure

Associate Engineer

Unisys Corporation
10.2015 - 07.2017
  • Developed and maintained the conversion of 9 bits data to 8 bits data for the migration of data from OS2200 systems to Windows and vice versa using TCP/IP and MELLANOX drivers
  • Developed windows application to access OS2200 data from Windows
  • Analyzed and created solutions and technical designs for Windows application to access the OS2200 files from windows
  • Maintained the reports using ASP.Net and C# in Presenter-Repository pattern
  • Provided outbound web access and other .NET capabilities
  • Led engineers in various development challenges, elected as KT champ for the module
  • Collaborated with QA on testing and fixing bugs
  • Elected as scrum master for the OS2200 file transfer module
  • Application Integrated Services-Connectivity Services is a layer 5 interface for Unisys OS network applications
  • It provides access to Unisys OS services from Connectivity Services clients running on remote systems
  • CS2200 is paired with remote Connectivity Services by a protocol that allows clients and agents to easily communicate with each other in a securely in a message-oriented environment

Project Engineer

Wipro Technologies
04.2014 - 10.2015
  • In-depth understanding of Hadoop Architecture and various components such as HDFS, Application master, Node Manager, Resource Manager, NameNode, DataNode and MapReduce Concepts
  • Developed Spark Programs using Python to compare the performance of spark with Hive and SQL
  • Imported data from different sources like AWS S3, LFS into Spark RDD
  • Worked with different file formats AVRO, Sequence file and various compression formats using Snappy
  • Involved in converting SQL scripts into Spark transformations using Spark Data Frames
  • Created a data mart in the data lake to enable Tableau access to build in-scope metrics
  • Experience in analyzing data using HIVEQL, PIG and created HIVE UDF'S to analyze and transform data into HDFS
  • Used Spark-SQL to load JSON data and create Schema RDD and loaded into Hive Tables and handled structured data using Spark-SQL
  • Loaded data and performed operations using Spark SQL and send the results to TABLEAU dashboards
  • Designed and Implemented partitioning (Static, Dynamic), Buckets in HIVE
  • Software development in a collaborative team environment using Scrum Agile methodologies to build data pipelines using Spark
  • Worked on processing batch and real time data using Spark using python
  • Used Sqoop to transfer the files efficiently between databases and HDFS and Flume to stream the log data from servers
  • Used JSON SerDe's for serialization and De-serialization to load JSON data into Hive Tables
  • Used Oozie workflow to coordinate Pig and Hive Scripts
  • Generated Data Flow Diagrams (DFD) and Unified Modeling Language (UML) Diagrams which explains system architecture to client
  • Involved in scrum meetings and sprint planning
  • Worked on issues reported by the QA team actively and resolved all the issues with data processing
  • Worked for one of the top financial corporations which issues top credit cards in the USA apart from that it is also advanced its goals in the field of auto loans, banking and savings accounts
  • Customer contact information, policy holdings data, claims summary, etc
  • Is keep on growing
  • Using Data Warehouses, it is hard to handle the large amounts of data operations, transformations which consumes more amount of time and adding more nodes for storage is difficult
  • So, we made the existence of Hadoop and its ecosystem and by using Teradata, Hive, Sqoop, Spark, Kafka, Amazon EMR etc
  • To perform all the data warehousing transformations efficiently in a very less amount of time

Project Engineer

Wipro Technologies
06.2013 - 04.2014
  • Developed IBRIX data migration tool for transferring the massive data from one server to other using High availability and RAID techniques
  • Created file systems for sharing files through NFS export or SMB share using HP file system wizard
  • Created new virtual machines using HP data center to transfer files from one server to other
  • Worked on CIFS and NFS file sharing protocols to transfer files from windows and Linux environment
  • Expertise in Windows ACL's and UNIX permissions for the files that are being transferred to clients
  • Created snapshots and NDMP backup and restore of files through RAID techniques
  • Created and updated unit test case, black box testing and integration testing and reported using Bugzilla
  • HP Store-All Storage is beyond traditional Network Attached Storage (NAS) in both capacity and performance
  • HP Store-All storage delivers excellent performance and a modular storage infrastructure to provide storage growth and performance using IBRIX file system which manages petabytes of data

Education

Master of Science - Big Data Analytics & CIS

University of Central Missouri
Warrensburg, MO
12.2020

Master of Science - Information And Computer Systems

Northwest Missouri State University
Maryville, MO
05.2019

Bachelor of Science - ELECTRICAL AND ELECTRONICS ENIGINEERING

SASTRA University
Thanjavur, TN, India
04.2013

Skills

  • Apache Spark, Hive, Hadoop, HDFS, HBase, Kafka, Airflow
  • Data Warehousing
  • Python, PySpark, C, SQL, R, Scala, Shell scripting
  • NumPy, SciPy, Scikit-learn, Pandas, Matplotlib, Pytables, Seaborn
  • MySQL, SQL Server, Snowflake, Cassandra, Teradata, Big Query
  • Amazon Web Services, Google Cloud Platform, Cloudera (CDH 56 & CDH 511)
  • Visual Studio, NetBeans, IntelliJ IDEA, Spyder, Teradata Studio
  • Tableau, Power BI, Excel
  • Git Hub, GitLab, Bit Bucket, Lucid Chart, Postman, FileZilla

Certification

Amazon Certified Solution Architect-Associate, 04MY60WJCBE4Q1SP, http://aws.amazon.com/verification

Timeline

Data Engineer

Walt Disney - Wipro Technologies - IDC
01.2024 - Current

DATA ENGINEER

Apple - Infosys - AML SHURI-JE
03.2023 - 01.2024

Data Engineer

Apple - Infosys - AMP CORE
03.2021 - 03.2023

Data Engineer Intern

LendingWise - Northwest Missouri State University
01.2019 - 04.2019

Data Analyst

LettuceDream - Northwest Missouri State University
08.2018 - 12.2018

Student Manager

Northwest Missouri State University
09.2017 - 07.2018

Associate Engineer

Unisys Corporation
10.2015 - 07.2017

Project Engineer

Wipro Technologies
04.2014 - 10.2015

Project Engineer

Wipro Technologies
06.2013 - 04.2014

Master of Science - Big Data Analytics & CIS

University of Central Missouri

Master of Science - Information And Computer Systems

Northwest Missouri State University

Bachelor of Science - ELECTRICAL AND ELECTRONICS ENIGINEERING

SASTRA University
Ravitheja Papareddy