Summary

Overview

Work History

Education

Skills

Timeline

ADITHYA PATHA

Canton,MI

Summary

• GCP Data Engineer with 9+ years of experience and expertise in GCP services including Cloud Storage, BigQuery, Dataflow, Dataproc, Pub/sub, Cloud run, IAM and other tools G-cloud function, cloud shell, GSUTIL and BQ command line utilities.
• Competent in creating new data pipelines on GCP using Dataflow and converting existing on-premises data pipelines to Google Cloud Platform using Dataproc / DataFlow.
• Good understanding of Hadoop Distributed File System and Eco System (PIG, HIVE, HBase, Sqoop, Spark, ZooKeeper).
• Well versed in configuring the Hadoop cluster using major Hadoop distributions like Cloudera, MapR, Horton Works.
• Hands on experience with data management strategy formulation, architectural blueprinting, and effort estimation
• Experience in analyzing large amounts of data writing PIG Latin Scripts using and using Hive Query Language.
• Successfully done in importing and exporting data between RDBMS into HDFS using Sqoop.
• Used Flume to channel data from different resources into HDFS.
• Experience in AVRO and Parquet file formats.
• Experience in writing Logical implementation and interaction with HBase.
• Very Good understanding of SQL, ETL and Data Warehousing Technologies.
• Worked on real time data integration using Kafka, Spark streaming and HBase.
• Experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
• Worked on developing Spark jobs using Python to test environment for faster data processing and used Spark SQL for querying.
• Experience in Spark and good knowledge on Spark-SQL, RDD’s, Lazy transformation and actions.
• Good working knowledge on NoSQL databases HBase, MongoDB.
• Experience on using Talend ETL tool.
• Involved all aspects of Software Development Life Cycle (Analysis, System Design, Development, testing and maintenance) using Waterfall and Agile methodologies.
• Hands on experience in Built tools like MAVEN and used Tekton, Jenkins for continuous Integration.
• Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.

Overview

years of professional experience

Work History

Data Engineer

FORD Motors

08.2021 - Current

As part of GDIA Data Factory Team, we have developed and processed for examining vital datasets, utilizing Google Cloud Dataproc, Cloud Storage, Cloud Function, BigQuery, Cloud Pub/Sub, and Dataflow to enhance streaming data enrichment
Onboarded new sources onto the Google Cloud Platform (GCP) using Ford Cloud Portal (FCP) and made data available for Alteryx to generate reports by extracting enterprise data from BigQuery
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Designed and implemented the GCP Project setup, IAM access and GCP Service Account setup for development, QA and production environments
Managed the complete ETL pipeline, coordinating data movement from source to Google Cloud Storage and subsequently to BigQuery
Optimized Hadoop jobs on Google Cloud Dataproc using efficient HDFS compression techniques for enhanced data processing
Employed Cloud Dataproc and BigQuery for querying, processing, and analyzing data in various file types
Experiencing in handling python and Spark context when writing Pyspark programs for ETL
Implemented streaming model using Confluent Kafka and Dataflow for near real-time data synchronization solution
Scheduled Airflow DAGs for orchestrated workflows, managing jobs on Google Cloud Dataproc and Dataflow
Worked migrating on-premises sources to Google Cloud using Compute Engine and Cloud Storage, optimizing data processing with Dataproc and BigQuery
Implemented and manage data models in DBT, guarantee accurate data transformation and alignment with business needs
Create dashboards on snowflake cost model, usage in QlikView
Created program in python to handle PL/SQL functions like cursors and loops which are not supported by snowflake
Responsible for owning and maintaining Tekton CI/CD pipelines for automated deployment
Worked on processing CCPA SMD/DMD requests as part of DSC CCPA Compliance requirements
Designed, executed, and monitoring QlikReplicate and Compose tasks
Created data endpoints, customizing tasks, execute them, and monitor the replication process in near real-time
Implemented and maintained CI/CD pipelines to automate the testing, deployment and monitoring of data pipelines and DBT models
Used Ford homegrown tools Dynamic Data Ingestion Tool (DDIT) and Transformation Enterprise Manager (TEM) for on prem data ingestion and data movement.

Data Engineer

Safeway

07.2020 - 08.2021

Developed Spark code using Pyspark for faster processing of data
Configured Spark to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala
Implemented project using Agile, scrum methodology and involved in daily standup meeting
Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL and Cloud DataProc
Scheduled jobs using Control-M
Identified and documented strategies, tools and phases in migration to Google Cloud Platform
Documented the inventory of modules, infrastructure, storage, components of existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration
Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud
Exposure on IAM roles in GCP
Developed Spark code by using Python and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables
Utilized Docker and Kubernetes for the run time environment for CI/CD system to build, test and deploy
Analyzing the different databases (Teradata and Big Query) from which the data is loading into the multiple reports and fixing the issues in the reports if any
Involving with different teams and back tracking the flows and experience in solving the critical issues
Transformed Teradata scripts and stored procedures to SQL and Python running on Snowflake's cloud platform
Troubleshooting production issues under client defined SLA
Wrote Hive queries for data analysis to meet the business requirements.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer

Macy’s

05.2018 - 07.2020

Collaboration with business and IT Stake Holders understanding requirements, product features and provide engineering solutions
Implemented project using Agile, scrum methodology and involved in daily standup meeting
Developed applications using Spark SQL
Wrote Pig scripts to run ETL jobs on the data in HDFS
Created ETL jobs using Talend Studio, scheduled them using TAC
Developed consumption framework to provision data from BigData repository using Spark Scala API as part of project and audit tracking end-to-end information into a table
Developed applications using Spark Scala API which consumes XML and Text files and store the data into Hive Tables
Used Hive to do analysis on the data and identify different correlations
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
Exported the analyzed data to the Hive Tables for visualization and to generate reports for the BI team
Load and transform large sets of structured, semi structured data even joins and some pre-aggregations before storing data into HDFS
Created partitions on Hive External tables and loaded the data into tables and query data using HQL
Worked in handling large XML complex datasets using Partitions, Spark in Memory capabilities, Effective & efficient Joins, Transformations and other during parsing process itself
Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
Involved in migrating the Talend jobs into Spark Jobs and Used Spark SQL data frames to load structured and semi structured data into Spark Clusters
Generated Tableau public dashboard with constraints to show specific aspects for a different purpose.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Data Engineer

Gap Inc

09.2015 - 05.2018

Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team
Loaded data into the cluster from relational database management systems using Sqoop
Developed applications using Spark SQL
Tested Spark Streaming to optimize streaming process and guarantee data quality
Wrote Pig scripts to run ETL jobs on the data in HDFS
Used Hive to do analysis on the data and identify different correlations
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Load and transform large sets of structured, semi structured data even joins and some pre-aggregations before storing data into HDFS
Implemented dash boards that internally use hive queries to perform analytics on structured data, Avro and JSON data
Created Hive External tables and loaded the data into tables and query data using HQL
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
Stored and fast update data in HBase, provided key based access to specific data
Involved in developing shell scripts and automated data management from end-to-end integration work
Developed a data pipeline using Kafka to store data into HDFS
Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster
Real time streaming the data using Spark with Kafka
Implemented project using Agile, scrum methodology and involved in daily standup meeting.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Education

Master of Science - Computer Science

Northwestern Polytechnic University

Fremont, CA

05.2015

Bachelor of Science - Computer Science And Engineering

JNTUH

05.2013

Skills

Hadoop/Big Data Technologies
GCP Cloud Services
Hive
Spark
Scala
Kafka
Python

SQL
NoSQL
Terraform
Tekton
Airflow
Agile

Timeline

Data Engineer

FORD Motors

08.2021 - Current

Data Engineer

Safeway

07.2020 - 08.2021

Data Engineer

Macy’s

05.2018 - 07.2020

Data Engineer

Gap Inc

09.2015 - 05.2018

Master of Science - Computer Science

Northwestern Polytechnic University

Bachelor of Science - Computer Science And Engineering

JNTUH

ADITHYA PATHA

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Master of Science - Computer Science

Bachelor of Science - Computer Science And Engineering

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Master of Science - Computer Science

Bachelor of Science - Computer Science And Engineering

Similar Profiles

ANIEKAN AKPANANIEKAN AKPAN

Berith Yalit MagdaleneBerith Yalit Magdalene

Roberto SolteroRoberto Soltero

Brianne HydeBrianne Hyde

MARIO ALIBASHIMARIO ALIBASHI