Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Venkata BORA

Venkata BORA

Dallas,TX

Summary

  • Extensive experience in data analytics and ETL migration projects to Google Cloud Platform (GCP) using tools like BigQuery, Cloud DataProc, Cloud Storage, and Composer.
  • Proficient in data modeling concepts (Star and Snowflake schemas), SQL (Presto, Hive) and programming with Python and PySpark.
  • Skilled in building robust airflow data pipelines using bash scripting on Unix/Linux systems and developing python Packages for ETL processes.
  • Hands-on experience with Sqoop for transferring data between RDBMS, HDFS, and Hive, and working with file formats like Avro, ORC, and Parquet.
  • Expertise in Spark-SQL, Pyspark for data transformations and Spark Streaming for real-time processing.
  • Strong skills in data preparation, modeling, and visualization using Power BI and Tableau to create impactful dashboards and reports.
  • Experienced in all phases of the SDLC, including analysis, design, development, testing, and deployment.
  • Effective collaborator with strong interpersonal skills, ensuring successful project delivery within scope, budget, and timelines.

Overview

8
8
years of professional experience
3
3
Certification

Work History

Data Engineer

Walmart
11.2023 - Current
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Documented and communicated database schemas using accepted notations.
  • Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.
  • Evaluated various tools, technologies, and best practices for potential adoption in the company''s data engineering processes.
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
  • Automated routine tasks using Python scripts, increasing team productivity and reducing manual errors.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.


Skills: Gcp, Bigquery, DataFlow, Dataproc, Composer, CloudSql, Python, Pyspark, sparksql, pubsub, airflow, Hive, Teradata, SqlServer, Informix, atomic, Agile, Tableau, Pandas, shell/bash, visio, intellij


Big Data Engineer

Revenue Analytics
07.2022 - 10.2023
  • Designed and implemented ETL pipelines using AWS services such as AWS Glue and Amazon EMR to process and transform large-scale data sets
  • Leveraged Apache Spark to perform complex data transformations, aggregations, and enrichment, improving data quality and efficiency
  • Utilized Spark's RDDs and DataFrames for batch processing, data manipulation, and advanced analytics on structured and semi-structured data
  • Implemented optimizations and performance tuning techniques in Spark jobs, resulting in significant reduction in processing time and resource utilization
  • Collaborated with cross-functional teams to define data transformation logic, data integration patterns, and schema evolution strategies
  • Worked closely with data architects to ensure alignment with data modeling and architecture best practices
  • Implemented data quality checks and error handling mechanisms within Spark pipelines to ensure accurate and reliable data processing
  • Orchestrated Spark jobs using Apache Airflow, creating a scalable and automated ETL workflow for timely data processing
  • Mentored junior engineers in Spark best practices, coding standards, and troubleshooting techniques


Skills: AWS, Athena, RedShift, Python, Pyspark, Spark Sql, Sql, Cloudwatch, Lambda, Batch, StepFunctions, Datacopy, Hive, Teradata, SqlServer, Agile, Tableau, QuickSight, Pandas, shell/bash, visio, pycharm

Sr Data Engineer- SupplyChain

Walmart
02.2021 - 06.2022
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
  • Designed and Co-ordinated with the Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets
  • Designed ETL's related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities
  • Designed and developed ETL Pipeline to extract Shipping Expeditor's API Data into the consumption layer using python
  • Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
  • Created data pipelines and scheduled using airflow to improve the data's reliability and quality while making sure to adhere Data Governance Policies
  • Build ETL pipeline(POC) to extract data from external sources and stream the data on HDFS using Spark Streaming
  • Lead the efforts to conduct internal training sessions on GCP/Bigquery within the team
  • Ingested bulk data using gutil/transfer service into raw zone using - cloud storage(GCP) and analyzed the data using Bigquery
  • Build Pipelines and experience in extracting the data from GCP/DataDiscovery clusters and helped BI/ML teams to explore the data on DataLab and BI tools

Skills: Gcp, Bigquery, DataFlow, Dataproc, Composer, CloudSql, Python, Pyspark, sparksql, pubsub, airflow, Hive, Teradata, SqlServer, Informix, atomic, Agile, Tableau, Pandas, shell/bash, visio, intellij

Data Engineer 3 - HRDatalake

Walmart
11.2018 - 12.2020
  • SME, Designed and built pipelines to extract and process source files from workday and stored the data on HR data lake using Hive
  • Supported HR Business Units by providing data ready to use for data analysts, data scientists for business insights, machine learning etc
  • Developed ETL pipeline to extract and load data from ranger into sql server consumption layer using python
  • Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake
  • Developed T-SQL scripts(Optimization, ETL, DDL, DML, Store Procedures etc) to extract data from data lake to consumption layer
  • Professional experience in building ETL pipelines to extract data using Spark

Skills: Hive, Teradata, SqlServer, Informix, Hdfs, atomic, Agile, Tableau, Pandas, shell/bash, visio, powerpoint, intellij, scala, Spark, PowerBi, Workday, Json, Xml

Sr BigData Consultant - Finance

Walmart
09.2017 - 10.2018
  • Designed ETL Process from different sources into HDFS/Hive/Teradata using internal aorta(sqoop) framework
  • Migrated secure and Unsecure data from Hadoop Prod17 to dev environments
  • Build ETL Pipelines to extract data from multiple sources/Pos files into Finance Datalake and build a unified consumption layer for business analytics
  • Created data pipelines and scheduled using airflow to improve the data's reliability and quality while making sure to adhere Data Governance Policies
  • Build ETL pipeline(POC) to extract data from external sources and stream the data on HDFS using Spark Streaming
  • Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake

Skills: Hive, Teradata, SqlServer, Informix, atomic, Agile, Tableau, Pandas, shell/bash, visio, powerpoint, intellij, scala, Spark, PowerBi, Workday, Json, Xml

Application Developer

Willis Towers Watson
09.2016 - 03.2017
  • Company Overview: (Matrix Resources)
  • Played an integral role in developing Client Proposal Integrator application to validate data from multiple systems, and authored Spark SQL scripts based on functional specifications
  • Seamlessly imported and exported data into HDFS, HIVE using Sqoop
  • Actively participated in creating Hive tables and writing multiple Hive queries to invoke and run MapReduce jobs in the backend
  • Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake
  • Developed T-SQL scripts(Optimization, ETL, DDL, DML, Store Procedures etc) to extract data from data lake to consumption layer
  • (Matrix Resources)

Skills: Pl Sql, Sql Server, shell/bash, visio, powerpoint

Associate Software Engineer

Beta Monks Technologies
11.2013 - 12.2014
  • Contributed to the development of Prepaid Card system for National Banks by methodically creating database objects such as tables, stored procedures etc
  • Identified, examined, and resolved issues by modifying backend code as needed, and performed unit testing of core system and business functionality to ensure adherence to quality criteria

Skills: oracle, unix, pl sql

Education

Master of Science - Information Systems

University of Maryland Baltimore County

Bachelor of Science - Computer Science Engineering

Andhra University

Skills

    Big Data Technologies: Hadoop MapReduce HDFS Kafka Hive Sqoop Automic Yarn Zookeeper Spark Core

    Cloud Platforms & Services: GCP GCP Cloud Storage BigQuery Cloud Dataproc Cloud Functions Cloud Pub/Sub AWS Athena Redshift Batch Step Functions Lambda Glue Crawler DataCopy EC2 EMR

    Programming & Scripting: Python SQL Shell Scripting Scala PySpark PL/SQL Spark SQL Pyspark

    Databases: Teradata Informix MySQL Oracle DB2 SQLite MS SQL Server

    Agile Methodologies: Agile Scrum Kanban

    Data Visualization & Modeling: Tableau QuickSight Power BI

Certification

  • GCP Certified Professional Data Engineer, Google, 2022
  • DataBricks Spark Certified, Databricks, 2021

Timeline

Data Engineer

Walmart
11.2023 - Current

Big Data Engineer

Revenue Analytics
07.2022 - 10.2023

Sr Data Engineer- SupplyChain

Walmart
02.2021 - 06.2022

Data Engineer 3 - HRDatalake

Walmart
11.2018 - 12.2020

Sr BigData Consultant - Finance

Walmart
09.2017 - 10.2018

Application Developer

Willis Towers Watson
09.2016 - 03.2017

Associate Software Engineer

Beta Monks Technologies
11.2013 - 12.2014

Bachelor of Science - Computer Science Engineering

Andhra University

Master of Science - Information Systems

University of Maryland Baltimore County
Venkata BORA