Summary
Overview
Work History
Education
Skills
Timeline
Generic

ADITHYA PATHA

Canton,MI

Summary

• GCP Data Engineer with 9+ years of experience and expertise in GCP services including Cloud Storage, BigQuery, Dataflow, Dataproc, Pub/sub, Cloud run, IAM and other tools G-cloud function, cloud shell, GSUTIL and BQ command line utilities.
• Competent in creating new data pipelines on GCP using Dataflow and converting existing on-premises data pipelines to Google Cloud Platform using Dataproc / DataFlow.
• Good understanding of Hadoop Distributed File System and Eco System (PIG, HIVE, HBase, Sqoop, Spark, ZooKeeper).
• Well versed in configuring the Hadoop cluster using major Hadoop distributions like Cloudera, MapR, Horton Works.
• Hands on experience with data management strategy formulation, architectural blueprinting, and effort estimation
• Experience in analyzing large amounts of data writing PIG Latin Scripts using and using Hive Query Language.
• Successfully done in importing and exporting data between RDBMS into HDFS using Sqoop.
• Used Flume to channel data from different resources into HDFS.
• Experience in AVRO and Parquet file formats.
• Experience in writing Logical implementation and interaction with HBase.
• Very Good understanding of SQL, ETL and Data Warehousing Technologies.
• Worked on real time data integration using Kafka, Spark streaming and HBase.
• Experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
• Worked on developing Spark jobs using Python to test environment for faster data processing and used Spark SQL for querying.
• Experience in Spark and good knowledge on Spark-SQL, RDD’s, Lazy transformation and actions.
• Good working knowledge on NoSQL databases HBase, MongoDB.
• Experience on using Talend ETL tool.
• Involved all aspects of Software Development Life Cycle (Analysis, System Design, Development, testing and maintenance) using Waterfall and Agile methodologies.
• Hands on experience in Built tools like MAVEN and used Tekton, Jenkins for continuous Integration.
• Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.

Overview

9
9
years of professional experience

Work History

Data Engineer

FORD Motors
08.2021 - Current
  • As part of GDIA Data Factory Team, we have developed and processed for examining vital datasets, utilizing Google Cloud Dataproc, Cloud Storage, Cloud Function, BigQuery, Cloud Pub/Sub, and Dataflow to enhance streaming data enrichment
  • Onboarded new sources onto the Google Cloud Platform (GCP) using Ford Cloud Portal (FCP) and made data available for Alteryx to generate reports by extracting enterprise data from BigQuery
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Designed and implemented the GCP Project setup, IAM access and GCP Service Account setup for development, QA and production environments
  • Managed the complete ETL pipeline, coordinating data movement from source to Google Cloud Storage and subsequently to BigQuery
  • Optimized Hadoop jobs on Google Cloud Dataproc using efficient HDFS compression techniques for enhanced data processing
  • Employed Cloud Dataproc and BigQuery for querying, processing, and analyzing data in various file types
  • Experiencing in handling python and Spark context when writing Pyspark programs for ETL
  • Implemented streaming model using Confluent Kafka and Dataflow for near real-time data synchronization solution
  • Scheduled Airflow DAGs for orchestrated workflows, managing jobs on Google Cloud Dataproc and Dataflow
  • Worked migrating on-premises sources to Google Cloud using Compute Engine and Cloud Storage, optimizing data processing with Dataproc and BigQuery
  • Implemented and manage data models in DBT, guarantee accurate data transformation and alignment with business needs
  • Create dashboards on snowflake cost model, usage in QlikView
  • Created program in python to handle PL/SQL functions like cursors and loops which are not supported by snowflake
  • Responsible for owning and maintaining Tekton CI/CD pipelines for automated deployment
  • Worked on processing CCPA SMD/DMD requests as part of DSC CCPA Compliance requirements
  • Designed, executed, and monitoring QlikReplicate and Compose tasks
  • Created data endpoints, customizing tasks, execute them, and monitor the replication process in near real-time
  • Implemented and maintained CI/CD pipelines to automate the testing, deployment and monitoring of data pipelines and DBT models
  • Used Ford homegrown tools Dynamic Data Ingestion Tool (DDIT) and Transformation Enterprise Manager (TEM) for on prem data ingestion and data movement.

Data Engineer

Safeway
07.2020 - 08.2021
  • Developed Spark code using Pyspark for faster processing of data
  • Configured Spark to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala
  • Implemented project using Agile, scrum methodology and involved in daily standup meeting
  • Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc
  • Scheduled jobs using Control-M
  • Identified and documented strategies, tools and phases in migration to Google Cloud Platform
  • Documented the inventory of modules, infrastructure, storage, components of existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration
  • Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud
  • Exposure on IAM roles in GCP
  • Developed Spark code by using Python and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables
  • Utilized Docker and Kubernetes for the run time environment for CI/CD system to build, test and deploy
  • Analyzing the different databases (Teradata and Big Query) from which the data is loading into the multiple reports and fixing the issues in the reports if any
  • Involving with different teams and back tracking the flows and experience in solving the critical issues
  • Transformed Teradata scripts and stored procedures to SQL and Python running on Snowflake's cloud platform
  • Troubleshooting production issues under client defined SLA
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer

Macy’s
05.2018 - 07.2020
  • Collaboration with business and IT Stake Holders understanding requirements, product features and provide engineering solutions
  • Implemented project using Agile, scrum methodology and involved in daily standup meeting
  • Developed applications using Spark SQL
  • Wrote Pig scripts to run ETL jobs on the data in HDFS
  • Created ETL jobs using Talend Studio, scheduled them using TAC
  • Developed consumption framework to provision data from BigData repository using Spark Scala API as part of project and audit tracking end-to-end information into a table
  • Developed applications using Spark Scala API which consumes XML and Text files and store the data into Hive Tables
  • Used Hive to do analysis on the data and identify different correlations
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
  • Exported the analyzed data to the Hive Tables for visualization and to generate reports for the BI team
  • Load and transform large sets of structured, semi structured data even joins and some pre-aggregations before storing data into HDFS
  • Created partitions on Hive External tables and loaded the data into tables and query data using HQL
  • Worked in handling large XML complex datasets using Partitions, Spark in Memory capabilities, Effective & efficient Joins, Transformations and other during parsing process itself
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
  • Involved in migrating the Talend jobs into Spark Jobs and Used Spark SQL data frames to load structured and semi structured data into Spark Clusters
  • Generated Tableau public dashboard with constraints to show specific aspects for a different purpose.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer

Gap Inc
09.2015 - 05.2018
  • Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team
  • Loaded data into the cluster from relational database management systems using Sqoop
  • Developed applications using Spark SQL
  • Tested Spark Streaming to optimize streaming process and guarantee data quality
  • Wrote Pig scripts to run ETL jobs on the data in HDFS
  • Used Hive to do analysis on the data and identify different correlations
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Load and transform large sets of structured, semi structured data even joins and some pre-aggregations before storing data into HDFS
  • Implemented dash boards that internally use hive queries to perform analytics on structured data, Avro and JSON data
  • Created Hive External tables and loaded the data into tables and query data using HQL
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
  • Stored and fast update data in HBase, provided key based access to specific data
  • Involved in developing shell scripts and automated data management from end-to-end integration work
  • Developed a data pipeline using Kafka to store data into HDFS
  • Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster
  • Real time streaming the data using Spark with Kafka
  • Implemented project using Agile, scrum methodology and involved in daily standup meeting.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Education

Master of Science - Computer Science

Northwestern Polytechnic University
Fremont, CA
05.2015

Bachelor of Science - Computer Science And Engineering

JNTUH
05.2013

Skills

  • Hadoop/Big Data Technologies
  • GCP Cloud Services
  • Hive
  • Spark
  • Scala
  • Kafka
  • Python
  • SQL
  • NoSQL
  • Terraform
  • Tekton
  • Airflow
  • Agile

Timeline

Data Engineer

FORD Motors
08.2021 - Current

Data Engineer

Safeway
07.2020 - 08.2021

Data Engineer

Macy’s
05.2018 - 07.2020

Data Engineer

Gap Inc
09.2015 - 05.2018

Master of Science - Computer Science

Northwestern Polytechnic University

Bachelor of Science - Computer Science And Engineering

JNTUH
ADITHYA PATHA