Summary

Overview

Work History

Education

Skills

Timeline

Ritesh Gopishetty

Beaverton

Summary

Data Engineering Manager with 10 years of experience designing large-scale data platforms, modern ELT pipelines, and cloud-native analytics solutions across Azure, AWS, and Snowflake. Proven track record leading cross-functional engineering teams, architecting enterprise data models, and delivering production-grade data solutions that support analytics, tax, and compliance-driven workloads. Skilled in Azure Data Factory, Databricks, Spark, Python, SQL, and governed ELT frameworks, with deep experience transforming raw data into actionable insights for business and client stakeholders. Adept at strategic planning, mentoring junior engineers, and driving adoption of modern data engineering practices. Strong background in data security, anonymization, CI/CD, and data quality automation. Recognized for partnering with business leaders to define data strategy, optimize data architecture, and deliver scalable, reliable, and compliant data systems.

Overview

years of professional experience

Work History

Data Engineer

Cloudwick Technologies

Beaverton, OR

05.2025 - Current

The Resilience, Brand, and Protection team at Nike is dedicated to safeguarding the company’s competitive edge and brand reputation by proactively identifying and responding to global risks and emerging trends. Through cross-functional collaboration with teams across Resilience, Global Technology, and other business units, the team delivers data-driven insights that inform strategic decisions and enhance organizational resilience. Their work supports enterprise-wide initiatives that align with Nike’s sport-first strategy and cultural values, ensuring the company remains agile, informed, and protected in a rapidly evolving global landscape.
Responsibilities:
Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
Designed and maintained scalable ETL/ELT pipelines to ensure high-quality data ingestion and transformation from diverse internal and external sources.
Created executive-level dashboards and visualizations using Tableau and PowerBI to communicate complex intelligence findings in a clear, actionable format.
Drove adoption of modern data engineering patterns, including CI/CD, automated testing, schema validation, and observability frameworks.
Designed and implemented reusable utility modules to streamline connections to enterprise systems including Box, RESTful APIs, and SQL Server, enhancing data accessibility and reducing integration overhead.
Acted as a technical advisor for enterprise clients, guiding architecture decisions, cloud migrations, and adoption of modern data engineering platforms.
Led delivery of complex data transformation programs, ensuring adherence to quality standards, timelines, and client expectations.
Built standardized functions for authentication, data retrieval, and error handling across multiple platforms, enabling consistent and secure data operations for analytics workflows.
Applied statistical techniques such as regression, hypothesis testing, and time-series forecasting to support decision-making in product creation and risk mitigation.
Acted as a technical advisor to cross-functional analytics teams, helping them evaluate and adopt modern data transformation patterns, modeling standards, and governed ELT workflows.
Partnered with internal engineering and analytics teams to translate business requirements into scalable data models, leveraging dimensional modeling, 3NF, and domain-driven design principles.
Collaborated with partner teams (Snowflake, AWS, Databricks) to optimize data pipelines, improve performance, and align with best practices for enterprise analytics workloads.

Big Data / Spark Developer

Cloudwick Technologies

Beaverton, OR

01.2024 - 10.2025

The main goal of the project is to migrate the existing data from Oracle to AWS S3 and perform ETL operations and store the data in the Snowflake that uses the SQL database engine designed for the cloud and also help the BI users to gain the business insights from it for preparing the dashboards.
Responsibilities:
Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
Migrating the existing data from Oracle to AWS and perform ETL operations on it using Qubole.
Responsible for using Hadoop and spark for data warehouse applications to maintain large datasets in AWS S3 and decide on engineering tools based on recommendations.
Design and develop spark scripts to gather data insights as per business requirements and collaborate with other teams on integration needs/design.
Facilitate or perform application support, problem solving, and issue resolution with internal and external resources.
Mentored junior engineers and analysts, providing guidance on data modeling, pipeline design, and cloud engineering best practices.
Resolve big data issues and determine options for issue resolution and risk mitigation.
Working with Avro and Parquet files formats and used various compression techniques to leverage the storage in HDFS.
Worked with the Data Scientist team to build pipelines for their Machine Learning models.
Review and approve performance test results, recommendations, and tuning results. Oversee and is responsible for the creation of test plans, test execution, and validation of test results.
Responsible for EMR Cluster creation, administration, sizing and configuration.
Created Spark jobs to see trends in data usage by users.
Worked with SCRUM team in delivering agreed user stories on time for every sprint.
Development and unit testing on Hadoop and AWS ecosystem.
Automate and monitor the ETL process and applications.
Good Knowledge on Spark framework on both batch and real time data processing.
Designed, developed ETL workflow and automated using Autosys.
Environment: Qubole, AWS, Snowflake, Spark, Airflow, Databricks, CICD.

Data Engineer

Staples Inc.

Framingham, MA

01.2024 - 04.2025

Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
Managing complete Governance on Azure, Airflow and Snowflake Environment
Designed and implemented scalable data pipelines on Azure using Azure Data Factory, Azure Databricks, and Azure Stream Analytics.
Developed data models and ETL processes for Snowflake data warehousing solutions using Snowflake’s SQL
Configured and managed Snowflake’s security and access controls, including Snowflake’s role-based access controls, encryption, and key management.
Involved in the migration project from Teradata to Snowflake for a large-scale data warehousing solution.
KPI Tracking: Established and monitored key performance indicators (KPI’s) to measure the effectiveness of digital marketing campaigns.
Airflow: Worked on complete setup of airflow from scratch using bitbucket to store the yaml file, fetching secrets from key vault and running the automation from Jenkins.
ROI Calculation: Calculated return on investment for digital marketing initiatives, considering both short-term and long-term impacts on revenue and brand visibility
Airflow to Databricks workflows: Converted the airflow DAG’s to databricks workflow and automated all the workflows for better improvement and cost

Big Data Developer

Data Capital Inc.

Bentonville, Arkansas

03.2018 - 10.2018

The main goal of the project is to migrate the existing data from Teradata/ Mainframes /Oracle to Hadoop and perform ETL operations that helps Walmart Business with the key insights and faster decisions using the cuting edge visualization tools like ThoughtSpot which is used as BI tool for holding the latest data to drill down to minute grain level.
Responsibilities:
Migrating the existing data from Mainframes/Teradata/Oracle to Hadoop and perform ETL operations on it.
Designed and Implemented Sqoop incremental imports, delta imports on tables without primary keys and dates from Teradata and appends directly into Hive Warehouse.
Used aorta connector to load History data into Teradata WM3/WMG boxes.
Experience in loading data from different sources into HDFS using internal aorta application.
Worked on POC to evaluate the performance of multi tenancy tables vs standalone tables and performance of views on top of these.
Used Automic workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as map-reduce Hive, Sqoop and Spark jobs.
Working with Avro and Parquet file formats and used various compression techniques to leverage the storage in HDFS.
Using the Mainframe SerDe’s and Avro SerDe’s for serialization and de-serialization in hive to parse the contents
Designed and developed ETL workflow using Automic for scheduling.
Environment: Hortonworks, Autosys, Automic, Oozie, Mainframes, Teradata, Oracle.

Bigdata / Spark Developer

Cloudwick Technologies

Newark, California

11.2017 - 03.2018

The main objective of the project is to perform analytics and gain insights from the data which is being moved from Teradata and Netezza to AWS Cloud environment. Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other sql based databases (Hive and Presto ) to perform analytics on the data.
Responsibilities:
Used Pyspark to read the data from S3 and perform various transformations to prepare the data for loading.
Develop python scripts for Data Quality/Standardization checks.
Worked with Spark for various transformations.
Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle.
Load the data from S3 to Hive and presto using different file formats like JSON and ORC
Load the data into redshift using Pyspark for generating the quarterly performance reports.
To facilitate the BI team to generate reports using Tableau /SAS based on the data present in Redshift.
Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL RDDs and Data Frames/Datasets.
Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
Environment: AWS, Oracle, Pyspark, Redshift, Tableau, Presto, Hive

Hadoop Engineer

Sparsity Systems LLC.

Orlando, Florida

01.2017 - 10.2017

Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other sql based databases (Hive and Presto ) to perform analytics on the data.
Responsibilities:
Used Nifi to ingest the data from various sources into the datalake.
Worked with Spark for various transformations.
Created Hive managed and external tables.
Used Kafka for streaming application.
Created topics in kafka broker which gets the data from sources with the help of Nifi and Spark job consumes it and pushes it into IBM Cloudant Database.
Worked with Partitioning, bucketing and other optimizations in hive.
Worked with ORC, JSON file formats and used various compression techniques to leverage the storage in HDFS.
Developed and implemented core API services using Spark with Scala.
Used Rally to keep the track of the user stories and tasks for completing in each sprint.
Worked on ingesting the data from hive to spark and create data frames in spark then updating it into IBM Cloudant Database.
Used Pivotal to perform business logic environment to call the REST API’s to update/create the documents in the IBM Cloudant Database
Also prepared the data with the help of Paxata (a data preparation tool) for our Business users.
Worked on various production issues during the month end support and provide the resolutions without missing any SLA.
Used GitHub to set the overall direction of the project and track the progress of the project.
Used Paxata for delivering the data to the BI users for creating the dashboards for the Daily Sales ticket of the theme park.
Environment: Hortonworks, Nifi, Kafka, IBM Cloudant, Paxata, GitHub.

Hadoop /Spark Developer

Sparsity Systems LLC.

New York City, NY

01.2016 - 01.2017

The main objective of the project is to perform analytics and gain insights from the data which is being moved from Teradata and Netezza to AWS Cloud environment. Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other SQL based databases (Hive and Presto ) to perform analytics on the data.
Responsibilities:
Used Pyspark to read the data from S3 and perform various transformations to prepare the data for loading.
Develop python scripts for Data Quality/Standardization checks.
Worked with Spark for various transformations.
Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle.
Load the data from S3 to Hive and presto using different file formats like JSON and ORC
Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
Environment: Alteryx, Bedrock, Hortonworks, Paxata, AWS S3, Hive, Teradata.

Education

Master of Science - Computer Science Engineering

University of Michigan, Flint

12-2015

Bachelors - Computer Sciences and Engineering

Jawaharlal Nehru Technological University

India

01-2014

Skills

Hadoop Ecosystem:
MapReduce
Hive
Pig
Flume
Sqoop
Oozie
Cloud:
Snowflake
AWS
Azure
Athena
EMR ,Ec2
AWS Glue
Lambdas
Streaming:
Spark
Kafka
NSP
Monitoring and Automation:
Nagios
Ganglia
Cloudera Manager

Autosys
Airflow
Databricks Workflow
Databases:
Oracle 9i/10g/11g
SQL Server 2005/2008
ADLS storage
Languages:
Python
C
Java
Reporting Tools:
Framework manager
Tableau
NOSQL database:
Cloudant
Hbase
Dynamo DB
Other Tools:
SQL Management Studio
Eclipse, Serena Version Control Tool,Jenkins
Jenkins

Timeline

Data Engineer

Cloudwick Technologies

05.2025 - Current

Big Data / Spark Developer

Cloudwick Technologies

01.2024 - 10.2025

Data Engineer

Staples Inc.

01.2024 - 04.2025

Big Data Developer

Data Capital Inc.

03.2018 - 10.2018

Bigdata / Spark Developer

Cloudwick Technologies

11.2017 - 03.2018

Hadoop Engineer

Sparsity Systems LLC.

01.2017 - 10.2017

Hadoop /Spark Developer

Sparsity Systems LLC.

01.2016 - 01.2017

Bachelors - Computer Sciences and Engineering

Jawaharlal Nehru Technological University

Master of Science - Computer Science Engineering

University of Michigan, Flint