Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Gavaskar Dubala

Irving,TX

Summary

Certified Cloudera Hadoop Developer and working experience on designing and implementing complete End-to-End Hadoop infrastructure using GCP, AWS, Azure, Python, Spark, Scala, MongoDB, HBase, Hive, Impala. Built the data extraction utility to serve the Policy, Quotes, Claims and Location data to various consumers for data analytics. Migrated the HDFS data storage to Amazon Web Services (AWS). Experience with migrating SQL databases to Azure data lake, Azure data lake analytics and Azure SQL data warehouse. Can work parallelly in both GCP, AWS and Azure Clouds coherently. Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, cloud shell, GSUTIL, BQ command line utilities, Data Proc. Hands-on experience in writing Python and Bash Scripts. Expertise in Implementing Spark and Scala programs for faster data processing. Having good experience in Spark-SQL, Data Frame, RDD's, Spark YARN. Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive. SQL concepts, Hive SQL, Python and Pyspark to cope up with the increasing volume of data. Designed and implemented Jenkins pipelines for CI/CD processes. Worked on NoSQL databases such as HBase and MongoDB and strong Knowledge on Cassandra. Experienced in Static and Dynamic Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Worked in different software methodologies like SDLC and Agile SCRUM. Created Design, Process documents and reviewing, merging the code into github made by the team.

Having 12+ years of overall IT experience with strong emphasis on Design, Implementation, Development, Testing, Deployment of Software Applications using GCP, AWS, Azure, Hadoop, HDFS, Python, Spark and Scala, Kafka, MongoDB, Hive, Impala, HBase, RDBMS and other Hadoop ecosystem tools.

Overview

13
13
years of professional experience

Work History

Sr Data Engineer

Equitable
Charlotte, NC
05.2022 - Current
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
  • Migrating an entire oracle database to BigQuery and using of power bi for reporting
  • Experience in moving data between GCP and Azure using Azure Data Factory
  • Experience in building power bi reports on Azure Analysis services for better performance
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
  • Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
  • Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets
  • Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities
  • Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage
  • Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process
  • Knowledge about cloud dataflow and Apache beam
  • Good knowledge in using cloud shell for various tasks and deploying services
  • Created BigQuery authorized views for row level security or exposing the data to other teams
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, SQOOP, Apache Spark, with Cloudera Distribution.

Sr AWS Engineer

Allstate Insurance
Irving, TX
10.2017 - 05.2022
  • Created a python simulator process to collect the application logs and other stats
  • Design and implement the data pipelines to ingest the legacy data into Hadoop from Mainframe tables
  • Migrated the HDFS data storage to Amazon Web Services (AWS) using AWS DataSync
  • Created Kafka consumer process to read the messages from source Kafka topics and write them into MongoDB, and other target systems
  • Developed the process to read the data and convert into AVRO format and ingest into Hadoop
  • Created Offramp process to convert AVRO to JSON to send to down streams
  • Built the data extraction utility to serve the Policy, Quotes, Claims and Location data to various consumers for data analytics
  • Designed a balance and control framework that supported the data pipeline containing critical financial and transactional data, facilitating the transfer between the front-end into the HDFS data lake
  • Imported the historical data into Hadoop landing zone and RDBMS using SQOOP
  • Ingested the data into MongoDB for internal users and served the data thru web service requests
  • Worked with Docker images and containers and used Kubernetes Cluster to submit spark applications
  • Implemented the wrapper scripts for batch processing and scheduled the jobs based on requirements
  • Created Jenkin pipelines for automated build and deployment CI/CD process
  • Created automatic ServiceNow incidents in case of any failure in the process due to data validation
  • Implemented Scala-test classes in each process to handle the basic unit testing
  • Handled the On-shore and Off-shore teams and reviewed and merged the code changes.

Sr Data Engineer

AMEX
New York City, NY
03.2017 - 10.2017
  • Created metadata sheets for the new requirements and got the sign off from the DOT teams to execute the process
  • Selecting derived data from multiple tables and ingested into new tables in cornerstone
  • Write the analytical hive queries to select the data from different tables using the joins
  • Implemented the shell scripts to manipulate the data and move to the different environments
  • Created the Event Engine, data writer nodes and automated the process to ingest the data into the tables
  • Created API Proxy using APIGEE dashboard and targeting the endpoint URLs for upstream and down streams to access the data from CSRT
  • Supported to CSRT team to replicate the data in their database to available the data to the real time applications
  • Release the API code using Jenkins and maintain the prod code in the GIT repository
  • Used the postman HTTP client tool to validate data for GET and POST access
  • Managed and reviewed Hadoop logs and created various log levels for API using logback xml
  • Wrote the shell scripts to copy HBase tables data from one host to another host.

Bigdata Consultant

H&R Block
Kansas City, MO
04.2016 - 02.2017
  • Ingested structured data into appropriate schemas and tables to support the rule and analytics
  • Imported the data from the Netezza table to Hadoop using Sqoop
  • Developed Managed, External and partition tables as per the requirement
  • Develop use cases to monitor the efficiency of Spark real time processing
  • Exploring with Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDDs
  • Involved in loading data from edge nodes to HDFS using shell scripting
  • Automated workflow using Shell Scripts
  • Used Oozie workflow engine to run multiple Hive and jobs
  • Experience in Hive partitioning, bucketing and performing different types of joins on Hive tables
  • Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.

Hadoop Developer

Zurich NA
Hyderabad, IN
02.2015 - 03.2016
  • Analyzing Hadoop clusters and different Big Data analytic tools including Hive, HBase and Sqoop
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server using Sqoop
  • Creating Hive tables, loading with data and writing Hive queries
  • Developed Managed, External and partition tables as per the requirement
  • Exporting data from HDFS into RDBMS using Sqoop for report generation and visualization
  • Good experience in Hive partitioning, bucketing and performing different types of joins on Hive tables
  • Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, and Sqoop actions
  • Experience in managing and reviewing Hadoop log files
  • Analyzed large amounts of data sets to determine an optimal way to aggregate and report on it
  • Responsible for developing the batch process using Unix Shell Scripting.

Software Developer

ZURICH DE
Koln, Germany
12.2010 - 02.2015
  • Worked on preparing LLD docs, test plans and code changes, then tested the changes
  • Wrote Shell scripts to automate business processes
  • Designed, developed Nessy claims systems using COBOL
  • Preparing the packages and moving it LIFE and PROD
  • Used JavaScript for client-side validations
  • Performing functionality validation to make sure that the build is done without defects
  • Analyzing the existing code and preparing low level design and unit test plan documents
  • Preparing unit test data and unit test results
  • Review the code changes done by another team member.

Education

Skills

  • Python, Scala, Core Java and Shell Scripting
  • Eclipse, IntelliJ, Lenses, SOAPUI, WinSCP, SQL Developer, XMLSpy
  • MongoDB, Cassandra, DynamoDB, PostgreSQL, MySQL, Oracle, and DB2
  • Agile SCRUM and Waterfall
  • JavaScript, XML and HTML
  • GCP (GCP Cloud Storage, BigQuery, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub), AWS (AWS EC2, S3, RDS, Redshift, Lambda, Boto3, DynamoDB), Azure (Azure Storage, Azure Database, Databricks, Synapse, ADF, SSRS, ADL, Azure HDInsight, ARM)

Timeline

Sr Data Engineer

Equitable
05.2022 - Current

Sr AWS Engineer

Allstate Insurance
10.2017 - 05.2022

Sr Data Engineer

AMEX
03.2017 - 10.2017

Bigdata Consultant

H&R Block
04.2016 - 02.2017

Hadoop Developer

Zurich NA
02.2015 - 03.2016

Software Developer

ZURICH DE
12.2010 - 02.2015

Gavaskar Dubala