Summary

Overview

Work History

Education

Skills

Websites

Timeline

Gavaskar Dubala

Irving,TX

Summary

Certified Cloudera Hadoop Developer and working experience on designing and implementing complete End-to-End Hadoop infrastructure using GCP, AWS, Azure, Python, Spark, Scala, MongoDB, HBase, Hive, Impala. Built the data extraction utility to serve the Policy, Quotes, Claims and Location data to various consumers for data analytics. Migrated the HDFS data storage to Amazon Web Services (AWS). Experience with migrating SQL databases to Azure data lake, Azure data lake analytics and Azure SQL data warehouse. Can work parallelly in both GCP, AWS and Azure Clouds coherently. Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, cloud shell, GSUTIL, BQ command line utilities, Data Proc. Hands-on experience in writing Python and Bash Scripts. Expertise in Implementing Spark and Scala programs for faster data processing. Having good experience in Spark-SQL, Data Frame, RDD's, Spark YARN. Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive. SQL concepts, Hive SQL, Python and Pyspark to cope up with the increasing volume of data. Designed and implemented Jenkins pipelines for CI/CD processes. Worked on NoSQL databases such as HBase and MongoDB and strong Knowledge on Cassandra. Experienced in Static and Dynamic Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Worked in different software methodologies like SDLC and Agile SCRUM. Created Design, Process documents and reviewing, merging the code into github made by the team.

Having 12+ years of overall IT experience with strong emphasis on Design, Implementation, Development, Testing, Deployment of Software Applications using GCP, AWS, Azure, Hadoop, HDFS, Python, Spark and Scala, Kafka, MongoDB, Hive, Impala, HBase, RDBMS and other Hadoop ecosystem tools.

Overview

years of professional experience

Work History

Sr Data Engineer

Equitable

Charlotte, NC

05.2022 - Current

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
Migrating an entire oracle database to BigQuery and using of power bi for reporting
Experience in moving data between GCP and Azure using Azure Data Factory
Experience in building power bi reports on Azure Analysis services for better performance
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets
Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities
Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage
Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process
Knowledge about cloud dataflow and Apache beam
Good knowledge in using cloud shell for various tasks and deploying services
Created BigQuery authorized views for row level security or exposing the data to other teams
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, SQOOP, Apache Spark, with Cloudera Distribution.

Sr AWS Engineer

Allstate Insurance

Irving, TX

10.2017 - 05.2022

Created a python simulator process to collect the application logs and other stats
Design and implement the data pipelines to ingest the legacy data into Hadoop from Mainframe tables
Migrated the HDFS data storage to Amazon Web Services (AWS) using AWS DataSync
Created Kafka consumer process to read the messages from source Kafka topics and write them into MongoDB, and other target systems
Developed the process to read the data and convert into AVRO format and ingest into Hadoop
Created Offramp process to convert AVRO to JSON to send to down streams
Built the data extraction utility to serve the Policy, Quotes, Claims and Location data to various consumers for data analytics
Designed a balance and control framework that supported the data pipeline containing critical financial and transactional data, facilitating the transfer between the front-end into the HDFS data lake
Imported the historical data into Hadoop landing zone and RDBMS using SQOOP
Ingested the data into MongoDB for internal users and served the data thru web service requests
Worked with Docker images and containers and used Kubernetes Cluster to submit spark applications
Implemented the wrapper scripts for batch processing and scheduled the jobs based on requirements
Created Jenkin pipelines for automated build and deployment CI/CD process
Created automatic ServiceNow incidents in case of any failure in the process due to data validation
Implemented Scala-test classes in each process to handle the basic unit testing
Handled the On-shore and Off-shore teams and reviewed and merged the code changes.

Sr Data Engineer

AMEX

New York City, NY

03.2017 - 10.2017

Created metadata sheets for the new requirements and got the sign off from the DOT teams to execute the process
Selecting derived data from multiple tables and ingested into new tables in cornerstone
Write the analytical hive queries to select the data from different tables using the joins
Implemented the shell scripts to manipulate the data and move to the different environments
Created the Event Engine, data writer nodes and automated the process to ingest the data into the tables
Created API Proxy using APIGEE dashboard and targeting the endpoint URLs for upstream and down streams to access the data from CSRT
Supported to CSRT team to replicate the data in their database to available the data to the real time applications
Release the API code using Jenkins and maintain the prod code in the GIT repository
Used the postman HTTP client tool to validate data for GET and POST access
Managed and reviewed Hadoop logs and created various log levels for API using logback xml
Wrote the shell scripts to copy HBase tables data from one host to another host.

Bigdata Consultant

H&R Block

Kansas City, MO

04.2016 - 02.2017

Ingested structured data into appropriate schemas and tables to support the rule and analytics
Imported the data from the Netezza table to Hadoop using Sqoop
Developed Managed, External and partition tables as per the requirement
Develop use cases to monitor the efficiency of Spark real time processing
Exploring with Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDDs
Involved in loading data from edge nodes to HDFS using shell scripting
Automated workflow using Shell Scripts
Used Oozie workflow engine to run multiple Hive and jobs
Experience in Hive partitioning, bucketing and performing different types of joins on Hive tables
Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.

Hadoop Developer

Zurich NA

Hyderabad, IN

02.2015 - 03.2016

Analyzing Hadoop clusters and different Big Data analytic tools including Hive, HBase and Sqoop
Successfully loading files to Hive and HDFS from Oracle, SQL Server using Sqoop
Creating Hive tables, loading with data and writing Hive queries
Developed Managed, External and partition tables as per the requirement
Exporting data from HDFS into RDBMS using Sqoop for report generation and visualization
Good experience in Hive partitioning, bucketing and performing different types of joins on Hive tables
Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, and Sqoop actions
Experience in managing and reviewing Hadoop log files
Analyzed large amounts of data sets to determine an optimal way to aggregate and report on it
Responsible for developing the batch process using Unix Shell Scripting.

Software Developer

ZURICH DE

Koln, Germany

12.2010 - 02.2015

Worked on preparing LLD docs, test plans and code changes, then tested the changes
Wrote Shell scripts to automate business processes
Designed, developed Nessy claims systems using COBOL
Preparing the packages and moving it LIFE and PROD
Used JavaScript for client-side validations
Performing functionality validation to make sure that the build is done without defects
Analyzing the existing code and preparing low level design and unit test plan documents
Preparing unit test data and unit test results
Review the code changes done by another team member.

Education

Skills

Python, Scala, Core Java and Shell Scripting
Eclipse, IntelliJ, Lenses, SOAPUI, WinSCP, SQL Developer, XMLSpy
MongoDB, Cassandra, DynamoDB, PostgreSQL, MySQL, Oracle, and DB2

Agile SCRUM and Waterfall
JavaScript, XML and HTML
GCP (GCP Cloud Storage, BigQuery, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub), AWS (AWS EC2, S3, RDS, Redshift, Lambda, Boto3, DynamoDB), Azure (Azure Storage, Azure Database, Databricks, Synapse, ADF, SSRS, ADL, Azure HDInsight, ARM)

Websites

https://www.linkedin.com/in/gavaskar-dubala-931b39131/

Timeline

Sr Data Engineer

Equitable

05.2022 - Current

Sr AWS Engineer

Allstate Insurance

10.2017 - 05.2022

Sr Data Engineer

AMEX

03.2017 - 10.2017

Bigdata Consultant

H&R Block

04.2016 - 02.2017

Hadoop Developer

Zurich NA

02.2015 - 03.2016

Software Developer

ZURICH DE

12.2010 - 02.2015

Gavaskar Dubala

Summary

Overview

Work History

Sr Data Engineer

Sr AWS Engineer

Sr Data Engineer

Bigdata Consultant

Hadoop Developer

Software Developer

Education

Skills

Websites

Timeline

Sr Data Engineer

Sr AWS Engineer

Sr Data Engineer

Bigdata Consultant

Hadoop Developer

Software Developer

Similar Profiles

Kimberly StenlakeKimberly Stenlake

Robert SinghRobert Singh

PATRINA RETERAPATRINA RETERA

Benjamin CobbBenjamin Cobb

Ebony WilliamsEbony Williams