Summary
Overview
Work History
Education
Skills
Timeline
Generic

Ritesh Gopishetty

Beaverton

Summary

Data Engineering Manager with 10 years of experience designing large-scale data platforms, modern ELT pipelines, and cloud-native analytics solutions across Azure, AWS, and Snowflake. Proven track record leading cross-functional engineering teams, architecting enterprise data models, and delivering production-grade data solutions that support analytics, tax, and compliance-driven workloads. Skilled in Azure Data Factory, Databricks, Spark, Python, SQL, and governed ELT frameworks, with deep experience transforming raw data into actionable insights for business and client stakeholders. Adept at strategic planning, mentoring junior engineers, and driving adoption of modern data engineering practices. Strong background in data security, anonymization, CI/CD, and data quality automation. Recognized for partnering with business leaders to define data strategy, optimize data architecture, and deliver scalable, reliable, and compliant data systems.

Overview

10
10
years of professional experience

Work History

Data Engineer

Cloudwick Technologies
Beaverton, OR
05.2025 - Current
  • The Resilience, Brand, and Protection team at Nike is dedicated to safeguarding the company’s competitive edge and brand reputation by proactively identifying and responding to global risks and emerging trends. Through cross-functional collaboration with teams across Resilience, Global Technology, and other business units, the team delivers data-driven insights that inform strategic decisions and enhance organizational resilience. Their work supports enterprise-wide initiatives that align with Nike’s sport-first strategy and cultural values, ensuring the company remains agile, informed, and protected in a rapidly evolving global landscape.
  • Responsibilities:
  • Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
  • Designed and maintained scalable ETL/ELT pipelines to ensure high-quality data ingestion and transformation from diverse internal and external sources.
  • Created executive-level dashboards and visualizations using Tableau and PowerBI to communicate complex intelligence findings in a clear, actionable format.
  • Drove adoption of modern data engineering patterns, including CI/CD, automated testing, schema validation, and observability frameworks.
  • Designed and implemented reusable utility modules to streamline connections to enterprise systems including Box, RESTful APIs, and SQL Server, enhancing data accessibility and reducing integration overhead.
  • Acted as a technical advisor for enterprise clients, guiding architecture decisions, cloud migrations, and adoption of modern data engineering platforms.
  • Led delivery of complex data transformation programs, ensuring adherence to quality standards, timelines, and client expectations.
  • Built standardized functions for authentication, data retrieval, and error handling across multiple platforms, enabling consistent and secure data operations for analytics workflows.
  • Applied statistical techniques such as regression, hypothesis testing, and time-series forecasting to support decision-making in product creation and risk mitigation.
  • Acted as a technical advisor to cross-functional analytics teams, helping them evaluate and adopt modern data transformation patterns, modeling standards, and governed ELT workflows.
  • Partnered with internal engineering and analytics teams to translate business requirements into scalable data models, leveraging dimensional modeling, 3NF, and domain-driven design principles.
  • Collaborated with partner teams (Snowflake, AWS, Databricks) to optimize data pipelines, improve performance, and align with best practices for enterprise analytics workloads.

Big Data / Spark Developer

Cloudwick Technologies
Beaverton, OR
01.2024 - 10.2025
  • The main goal of the project is to migrate the existing data from Oracle to AWS S3 and perform ETL operations and store the data in the Snowflake that uses the SQL database engine designed for the cloud and also help the BI users to gain the business insights from it for preparing the dashboards.
  • Responsibilities:
  • Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
  • Migrating the existing data from Oracle to AWS and perform ETL operations on it using Qubole.
  • Responsible for using Hadoop and spark for data warehouse applications to maintain large datasets in AWS S3 and decide on engineering tools based on recommendations.
  • Design and develop spark scripts to gather data insights as per business requirements and collaborate with other teams on integration needs/design.
  • Facilitate or perform application support, problem solving, and issue resolution with internal and external resources.
  • Mentored junior engineers and analysts, providing guidance on data modeling, pipeline design, and cloud engineering best practices.
  • Resolve big data issues and determine options for issue resolution and risk mitigation.
  • Working with Avro and Parquet files formats and used various compression techniques to leverage the storage in HDFS.
  • Worked with the Data Scientist team to build pipelines for their Machine Learning models.
  • Review and approve performance test results, recommendations, and tuning results. Oversee and is responsible for the creation of test plans, test execution, and validation of test results.
  • Responsible for EMR Cluster creation, administration, sizing and configuration.
  • Created Spark jobs to see trends in data usage by users.
  • Worked with SCRUM team in delivering agreed user stories on time for every sprint.
  • Development and unit testing on Hadoop and AWS ecosystem.
  • Automate and monitor the ETL process and applications.
  • Good Knowledge on Spark framework on both batch and real time data processing.
  • Designed, developed ETL workflow and automated using Autosys.
  • Environment: Qubole, AWS, Snowflake, Spark, Airflow, Databricks, CICD.

Data Engineer

Staples Inc.
Framingham, MA
01.2024 - 04.2025
  • Evaluate, extract/transform data for analytical purpose within the context of Big data environment.
  • Managing complete Governance on Azure, Airflow and Snowflake Environment
  • Designed and implemented scalable data pipelines on Azure using Azure Data Factory, Azure Databricks, and Azure Stream Analytics.
  • Developed data models and ETL processes for Snowflake data warehousing solutions using Snowflake’s SQL
  • Configured and managed Snowflake’s security and access controls, including Snowflake’s role-based access controls, encryption, and key management.
  • Involved in the migration project from Teradata to Snowflake for a large-scale data warehousing solution.
  • KPI Tracking: Established and monitored key performance indicators (KPI’s) to measure the effectiveness of digital marketing campaigns.
  • Airflow: Worked on complete setup of airflow from scratch using bitbucket to store the yaml file, fetching secrets from key vault and running the automation from Jenkins.
  • ROI Calculation: Calculated return on investment for digital marketing initiatives, considering both short-term and long-term impacts on revenue and brand visibility
  • Airflow to Databricks workflows: Converted the airflow DAG’s to databricks workflow and automated all the workflows for better improvement and cost

Big Data Developer

Data Capital Inc.
Bentonville, Arkansas
03.2018 - 10.2018
  • The main goal of the project is to migrate the existing data from Teradata/ Mainframes /Oracle to Hadoop and perform ETL operations that helps Walmart Business with the key insights and faster decisions using the cuting edge visualization tools like ThoughtSpot which is used as BI tool for holding the latest data to drill down to minute grain level.
  • Responsibilities:
  • Migrating the existing data from Mainframes/Teradata/Oracle to Hadoop and perform ETL operations on it.
  • Designed and Implemented Sqoop incremental imports, delta imports on tables without primary keys and dates from Teradata and appends directly into Hive Warehouse.
  • Used aorta connector to load History data into Teradata WM3/WMG boxes.
  • Experience in loading data from different sources into HDFS using internal aorta application.
  • Worked on POC to evaluate the performance of multi tenancy tables vs standalone tables and performance of views on top of these.
  • Used Automic workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as map-reduce Hive, Sqoop and Spark jobs.
  • Working with Avro and Parquet file formats and used various compression techniques to leverage the storage in HDFS.
  • Using the Mainframe SerDe’s and Avro SerDe’s for serialization and de-serialization in hive to parse the contents
  • Designed and developed ETL workflow using Automic for scheduling.
  • Environment: Hortonworks, Autosys, Automic, Oozie, Mainframes, Teradata, Oracle.

Bigdata / Spark Developer

Cloudwick Technologies
Newark, California
11.2017 - 03.2018
  • The main objective of the project is to perform analytics and gain insights from the data which is being moved from Teradata and Netezza to AWS Cloud environment. Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other sql based databases (Hive and Presto ) to perform analytics on the data.
  • Responsibilities:
  • Used Pyspark to read the data from S3 and perform various transformations to prepare the data for loading.
  • Develop python scripts for Data Quality/Standardization checks.
  • Worked with Spark for various transformations.
  • Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle.
  • Load the data from S3 to Hive and presto using different file formats like JSON and ORC
  • Load the data into redshift using Pyspark for generating the quarterly performance reports.
  • To facilitate the BI team to generate reports using Tableau /SAS based on the data present in Redshift.
  • Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL RDDs and Data Frames/Datasets.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Environment: AWS, Oracle, Pyspark, Redshift, Tableau, Presto, Hive

Hadoop Engineer

Sparsity Systems LLC.
Orlando, Florida
01.2017 - 10.2017
  • Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other sql based databases (Hive and Presto ) to perform analytics on the data.
  • Responsibilities:
  • Used Nifi to ingest the data from various sources into the datalake.
  • Worked with Spark for various transformations.
  • Created Hive managed and external tables.
  • Used Kafka for streaming application.
  • Created topics in kafka broker which gets the data from sources with the help of Nifi and Spark job consumes it and pushes it into IBM Cloudant Database.
  • Worked with Partitioning, bucketing and other optimizations in hive.
  • Worked with ORC, JSON file formats and used various compression techniques to leverage the storage in HDFS.
  • Developed and implemented core API services using Spark with Scala.
  • Used Rally to keep the track of the user stories and tasks for completing in each sprint.
  • Worked on ingesting the data from hive to spark and create data frames in spark then updating it into IBM Cloudant Database.
  • Used Pivotal to perform business logic environment to call the REST API’s to update/create the documents in the IBM Cloudant Database
  • Also prepared the data with the help of Paxata (a data preparation tool) for our Business users.
  • Worked on various production issues during the month end support and provide the resolutions without missing any SLA.
  • Used GitHub to set the overall direction of the project and track the progress of the project.
  • Used Paxata for delivering the data to the BI users for creating the dashboards for the Daily Sales ticket of the theme park.
  • Environment: Hortonworks, Nifi, Kafka, IBM Cloudant, Paxata, GitHub.

Hadoop /Spark Developer

Sparsity Systems LLC.
New York City, NY
01.2016 - 01.2017
  • The main objective of the project is to perform analytics and gain insights from the data which is being moved from Teradata and Netezza to AWS Cloud environment. Our responsibility is to build an ETL data pipeline to load it into data warehouse platforms like Redshift and other SQL based databases (Hive and Presto ) to perform analytics on the data.
  • Responsibilities:
  • Used Pyspark to read the data from S3 and perform various transformations to prepare the data for loading.
  • Develop python scripts for Data Quality/Standardization checks.
  • Worked with Spark for various transformations.
  • Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle.
  • Load the data from S3 to Hive and presto using different file formats like JSON and ORC
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Environment: Alteryx, Bedrock, Hortonworks, Paxata, AWS S3, Hive, Teradata.

Education

Master of Science - Computer Science Engineering

University of Michigan, Flint
12-2015

Bachelors - Computer Sciences and Engineering

Jawaharlal Nehru Technological University
India
01-2014

Skills

  • Hadoop Ecosystem:
  • MapReduce
  • Hive
  • Pig
  • Flume
  • Sqoop
  • Oozie
  • Cloud:
  • Snowflake
  • AWS
  • Azure
  • Athena
  • EMR ,Ec2
  • AWS Glue
  • Lambdas
  • Streaming:
  • Spark
  • Kafka
  • NSP
  • Monitoring and Automation:
  • Nagios
  • Ganglia
  • Cloudera Manager
  • Autosys
  • Airflow
  • Databricks Workflow
  • Databases:
  • Oracle 9i/10g/11g
  • SQL Server 2005/2008
  • ADLS storage
  • Languages:
  • Python
  • C
  • Java
  • Reporting Tools:
  • Framework manager
  • Tableau
  • NOSQL database:
  • Cloudant
  • Hbase
  • Dynamo DB
  • Other Tools:
  • SQL Management Studio
  • Eclipse, Serena Version Control Tool,Jenkins
  • Jenkins

Timeline

Data Engineer

Cloudwick Technologies
05.2025 - Current

Big Data / Spark Developer

Cloudwick Technologies
01.2024 - 10.2025

Data Engineer

Staples Inc.
01.2024 - 04.2025

Big Data Developer

Data Capital Inc.
03.2018 - 10.2018

Bigdata / Spark Developer

Cloudwick Technologies
11.2017 - 03.2018

Hadoop Engineer

Sparsity Systems LLC.
01.2017 - 10.2017

Hadoop /Spark Developer

Sparsity Systems LLC.
01.2016 - 01.2017

Bachelors - Computer Sciences and Engineering

Jawaharlal Nehru Technological University

Master of Science - Computer Science Engineering

University of Michigan, Flint
Ritesh Gopishetty