Summary

Overview

Work History

Education

Skills

Certification

Timeline

SRINIVASA REDDY GOPU

Atlanta,GA

Summary

Experienced Senior Cloud Data Engineer with over 16 years of industry expertise. Skilled in managing large-scale Datawarehouse projects using both waterfall and Agile methodologies. Demonstrated proficiency in Google Cloud tools, focusing on data migration and ETL processes from on-premises to the Google Cloud platform using Python. Also experienced in ML and Genai tools, API and SDK development, and low-code web UI development.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

CVS HEALTH CORPORATION

03.2023 - Current

Designed and implemented proof-of-concept (POC) solutions to support machine learning teams in migrating large-scale data workloads from on-premise systems to Google Cloud Platform (GCP), with a focus on scalable and efficient cloud-based data pipelines.
Collaborated with cross-functional ML and analytics teams to assess existing data infrastructure and ETL workflows across on-premise and cloud environments.
Led the migration of analytical and ML datasets from Hadoop and Hive to Google Cloud Storage and BigQuery, utilizing orchestration and transformation tools such as Cloud Composer (Airflow) and Dataflow.
Built robust data pipelines for moving and transforming structured and semi-structured data, ensuring reliability, performance, and scalability in production-grade systems.
Developed a POC to migrate H2O model-serving data pipelines to GCP using Spark clusters, Python, BigQuery, and H2O ML components.
Partnered with ML engineers to refactor and optimize model training workflows for execution within Vertex AI, aligning data infrastructure with modern MLOps best practices.
Deep understanding of the data lifecycle in machine learning workflows, from data ingestion and transformation to model training and prediction.
Engineered a custom ML model data pipeline (Sparse Feature Fynder) for processing and predicting sparse medical code features at scale.
Built an end-to-end POC for processing and embedding clinical text data using Hugging Face models and Cloud Dataflow.
Developed low-code internal tools and dashboards using Dash (Plotly) with Cloud Spanner as the backend to enable data-driven decision-making and monitoring.
Acquired broad expertise in data architecture, cloud-native pipeline design, API development, and Python-based microservices supporting machine learning and analytics use cases.

Skills: Google Cloud BigQuery, Cloud Composer, Airflow, Dataflow, Cloud Storage, Google Cloud Vertex AI, spanner, ML and Genai models, dash, api development, H2O ML.

Lead Cloud Data Engineer with BI

Macy's Technology

10.2021 - 03.2023

Acted as a subject matter expert in analyzing requirements and preparing specifications for data ingestion pipelines across multiple Google Cloud BigQuery projects.
Owned end-to-end data solutions in high-visibility roles, collaborating with cross-functional teams including business analysts, developers, QA, and reporting teams.
Led the FBI reporting modernization project by migrating legacy Oracle-based ETL processes to BigQuery and transitioning reporting from OBIEE to Cognos.
Collaborated with BI teams to troubleshoot and resolve data issues between BigQuery datasets and reporting tools during QA and UAT cycles.
Designed and implemented batch and near real-time data ingestion pipelines into BigQuery using Cloud Composer (Airflow), Dataflow, and Python, supporting internal business reporting use cases.
Created and maintained complex BigQuery tables, views, and analytical SQL queries for downstream reporting and analytics.
Orchestrated ETL and data integration workflows using Cloud Composer and Control-M, ensuring efficient job scheduling and error handling.
Converted legacy Unix-based GCP export batch jobs from Edge nodes into modular, maintainable Python code.
Developed both built-in and custom Dataflow pipelines using Java to ingest near real-time data from Cloud Spanner and MongoDB via Pub/Sub into BigQuery for analytics.
Created a POC using Dataflow and Python for seamless MongoDB-to-BigQuery migration, showcasing flexibility in tool adoption.
Integrated source code via GitLab and managed deployment workflows to UAT and Production environments using Jenkins CI/CD pipelines.
Successfully delivered four high-impact BigQuery projects within eight months through effective planning, execution, and stakeholder coordination.
Supervised and reviewed offshore development activities, ensuring code quality and timely delivery.
Monitored Dataproc Spark jobs and led a POC initiative to convert Spark jobs to PySpark, improving pipeline maintainability and performance.
Solid understanding and hands-on experience with CI/CD practices using Git, Jenkins, and cloud-native deployment patterns.

Skills: Python, Java, PySpark, Google Cloud Platform (BigQuery, Cloud Composer, Dataflow, Pub/Sub, Spanner, Dataproc), Airflow, Control-M, Jenkins, GitLab, GitHub, MongoDB, Cognos, OBIEE, Jira

Cloud Data Engineer

Vodafone Hungary

06.2020 - 09.2021

Spearheaded the digital transformation of Vodafone’s BI and analytics capabilities, driving a unified, reusable, and sharable data architecture using Google Cloud Platform (GCP) tools.
Built scalable, reusable data pipelines to process CSV data using Google Cloud DataFusion, enabling efficient data integration and transformation across global markets.
Developed and orchestrated robust ETL pipelines with Python and Cloud Composer (Airflow), ensuring seamless data flows and integration across various cloud and on-premise systems.
Designed and created BigQuery tables and views, working closely with the architecture team to optimize query performance and ensure efficient data storage and retrieval.
Rapidly learned and mastered Cloud DataFusion, integrating the tool into the project pipeline and training team members on its usage and best practices for cloud-based data integration.
Developed and implemented CI/CD pipelines using GitHub and Jenkins, ensuring smooth deployment cycles, version control, and automation across multiple environments.
Led the successful completion of four POCs on the development platform, acting as the Subject Matter Expert (SME) for GCP development and code migrations to cloud infrastructure.
Provided comprehensive project support, including scoping, estimating, planning, designing, developing, and maintaining data pipelines and cloud infrastructure.
Led the entire Software Development Life Cycle (SDLC) for Google Cloud projects, ensuring all phases from planning to production releases were executed smoothly with proper integration of ETL and DevOps tools.
Set up and managed the complete CI/CD process using Git, Jenkins, and cloud-native tools, ensuring efficient code migration, versioning, and continuous deployment pipelines.

Skills: Python, Google cloud Bigquery, Google cloud composer(airflow), Cloud Datafusion, Github, ControlM, Jenkins, Jira.

Cloud Data Engineer

Macy's Technology

03.2019 - 05.2020

Data as a service is to digitally transform Macy's data on Legacy sql systems and Hadoop Hive to Google cloud ETL and Google Bigquery.
GCP tech lead and SME for 3 Data as a service offshore team to develop GCP based projects.
Migrated fulfillment data on legacy Oracle and ODI ETL to Google cloud BigQuery and cloud composer (Airflow) ETL.
Ingested clickstream and views data from external systems to Google cloud storage and BigQuery using Composer ETL.
Developed java based custom Dataflow pipeline job for streaming near real time data from external system to PUBSUB to BigQuery for loyalty integration program data.
Migrated Hadoop On-premises Hive SQL jobs to Cloud Dataproc, spark and BigQuery.
Orchestrated and scheduled batch jobs using Google cloud composer and Control-M.
Worked on Github, Jenkins and Jira for agile approach delivery.
Migrated complex nested calculations from OBIEE to Google cloud BigQuery using nested tables and improved reports performance.
Closely worked with reporting users and helped Cognos and BI reporting teams to fix issues and improve performance.
Environment: Python, Google cloud Bigquery, Google cloud composer(airflow), Cloud Dataflow, Cloud pubsub, dataproc, Github, ControlM, Jenkins, Jira.

Data Analyst

The Home Depot

06.2017 - 03.2019

Supply chain cloud on boarding is to migrate Data warehousing data on premise Teradata to Hadoop and Google Bigquery using in-house ETL tools.
Migrated entire supply chain data (~500 tables) on EDW Teradata to cost-effective Google cloud BigQuery.
Worked with IT and data owners to understand the types of data collected in various databases and data warehouses and define the migration strategy to move existing data into the Google cloud platform and BigQuery.
Analyzed existing ETL mechanisms, developed complex analytical queries to create 500+ tables and authorized views on BigQuery to support the same logic used on Teradata.
Created workflow ETL pipelines to move data from RDBMS (Oracle, DB2, SQL Server, Teradata) systems to the Hadoop ecosystem using Sqoop and Hive.
Built workflow jobs and migrated data from Hadoop to Google cloud storage using in-house Data factory services.
Developed pipeline jobs to move data from GCS to BigQuery, created complex analytical queries, and authorized views on BigQuery in Agile mode.
Built and orchestrated ETL pipeline jobs and tasks using in-house query workflow and scheduling REST API tool called Data load framework (DLF) on Cloud App engine.
Developed audit mechanism using Python.
Worked closely with data scientists and analysts in the supply chain to understand their data requirements for existing and future tables on data analytics applications.
Learned agile approach of work and implemented using Jira and slack.
Environment: Python, Google cloud Bigquery, Cloud sql, Hadoop, Hive, Sqoop, Data factory, Teradata, Oracle, DB2, Jira, Github, Slack channel.

Data warehouse developer

Bank of America

02.2012 - 05.2017

TAAS (Teradata as a service) is an enterprise initiative in simplify and improve program (SIM) to consolidate disparate Teradata platforms in an organization to a single EDW Teradata ecosystem.
Primary onshore subject matter expert for data movement ETL from/to Teradata.
Developed Python, UNIX, and Perl scripts for data backup to Teradata/Cloudera Hadoop.
Developed data backup jobs using Teradata data mover ETL and scheduled them in Autosys scheduler.
Worked across multiple cross-functional teams and reviewed the complete life cycle of ETL applications to derive the best backup solutions in Teradata and Hadoop.
Assisted in creating documents that ensure consistency in development across the online organization. Implements and improves core software infrastructure.
Initially started the project to prepare disaster recovery for small applications operational data, later expanded it as a single project team for the entire organization's Teradata needs to support disaster recovery, Analytics, and Archive.
Maintained proper capacity management of Teradata data.
Managed more offshore members of the development team.
Environment: Unix, Perl, Autosys, Teradata, Oracle, Hadoop, Hive, SVN repository, Maximo

L1/L2 Production support

Bank of America

04.2008 - 02.2012

TCRIS (Trading credit risk information system)/MACRISK (Market credit risk information system) together form a system that helps to manage the Credit Risk.
Understanding of UNIX systems, especially Linux job failures and fixing the code issues.
Supported hundreds of 24/7 Autosys production jobs for 3 systems that runs on Unix using bash and Perl scripting.
Monitored daily publishing GUI dashboard and reported any issues in calculations.
Fixed code issues and enhanced the code as per changing business need in Perl and Unix.
Gained knowledge of standard SQL using large scale DB systems such as Oracle, DB2.
Developed automated scripts for daily routine tasks in Unix and Perl.
Environment: Unix, Perl, Autosys, Oracle, DB2, SVN repository, Maximo, HP quality center.

Education

Bachelor’s Degree - Computer Science

Acharya Nagarjuna University

04.2007

Skills

Google cloud BigQuery and Spanner proficiency
Google cloud dataflow, composer and datafusion pipelines
ML, Genai and Google cloud vertex ai pipelines

ETL, API and SDK development and integration using Python
Low code web ui development using dash and streamlet
Proficiency in developing AI tools for automation

Certification

Introduction to Oracle 8i (1Z0-007), Oracle Corporation, 2009
Zend certified PHP Engineer (200-550), ZEND, 2015
Cloudera Custom Training for Apache Hadoop, Cloudera, 2015
MongoDB for Developers (M101P), MongoDB University, 2017
Data engineering with Google Cloud Professional, Coursera, 2020
Google Cloud Professional Data Engineer, Google Cloud (credential.net), 2021

Timeline

Senior Data Engineer

CVS HEALTH CORPORATION

03.2023 - Current

Lead Cloud Data Engineer with BI

Macy's Technology

10.2021 - 03.2023

Cloud Data Engineer

Vodafone Hungary

06.2020 - 09.2021

Cloud Data Engineer

Macy's Technology

03.2019 - 05.2020

Data Analyst

The Home Depot

06.2017 - 03.2019

Data warehouse developer

Bank of America

02.2012 - 05.2017

L1/L2 Production support

Bank of America

04.2008 - 02.2012

Bachelor’s Degree - Computer Science

Acharya Nagarjuna University

SRINIVASA REDDY GOPU

Summary

Overview

Work History

Senior Data Engineer

Lead Cloud Data Engineer with BI

Cloud Data Engineer

Cloud Data Engineer

Data Analyst

Data warehouse developer

L1/L2 Production support

Education

Bachelor’s Degree - Computer Science

Skills

Certification

Timeline

Senior Data Engineer

Lead Cloud Data Engineer with BI

Cloud Data Engineer

Cloud Data Engineer

Data Analyst

Data warehouse developer

L1/L2 Production support

Bachelor’s Degree - Computer Science

Similar Profiles

ASHLEY BLOUNTASHLEY BLOUNT

Mark AbrahamMark Abraham

Lyndora JohnsonLyndora Johnson

Kylee BreedloveKylee Breedlove

Monet SaundersMonet Saunders