Summary

Overview

Work History

Education

Skills

Certification

Timeline

Venkata BORA

Dallas,TX

Summary

Extensive experience in data analytics and ETL migration projects to Google Cloud Platform (GCP) using tools like BigQuery, Cloud DataProc, Cloud Storage, and Composer.
Proficient in data modeling concepts (Star and Snowflake schemas), SQL (Presto, Hive) and programming with Python and PySpark.
Skilled in building robust airflow data pipelines using bash scripting on Unix/Linux systems and developing python Packages for ETL processes.
Hands-on experience with Sqoop for transferring data between RDBMS, HDFS, and Hive, and working with file formats like Avro, ORC, and Parquet.
Expertise in Spark-SQL, Pyspark for data transformations and Spark Streaming for real-time processing.
Strong skills in data preparation, modeling, and visualization using Power BI and Tableau to create impactful dashboards and reports.
Experienced in all phases of the SDLC, including analysis, design, development, testing, and deployment.
Effective collaborator with strong interpersonal skills, ensuring successful project delivery within scope, budget, and timelines.

Overview

years of professional experience

Certification

Work History

Data Engineer

Walmart

11.2023 - Current

Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
Documented and communicated database schemas using accepted notations.
Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.
Evaluated various tools, technologies, and best practices for potential adoption in the company''s data engineering processes.
Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
Automated routine tasks using Python scripts, increasing team productivity and reducing manual errors.
Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.

Skills: Gcp, Bigquery, DataFlow, Dataproc, Composer, CloudSql, Python, Pyspark, sparksql, pubsub, airflow, Hive, Teradata, SqlServer, Informix, atomic, Agile, Tableau, Pandas, shell/bash, visio, intellij

Big Data Engineer

Revenue Analytics

07.2022 - 10.2023

Designed and implemented ETL pipelines using AWS services such as AWS Glue and Amazon EMR to process and transform large-scale data sets
Leveraged Apache Spark to perform complex data transformations, aggregations, and enrichment, improving data quality and efficiency
Utilized Spark's RDDs and DataFrames for batch processing, data manipulation, and advanced analytics on structured and semi-structured data
Implemented optimizations and performance tuning techniques in Spark jobs, resulting in significant reduction in processing time and resource utilization
Collaborated with cross-functional teams to define data transformation logic, data integration patterns, and schema evolution strategies
Worked closely with data architects to ensure alignment with data modeling and architecture best practices
Implemented data quality checks and error handling mechanisms within Spark pipelines to ensure accurate and reliable data processing
Orchestrated Spark jobs using Apache Airflow, creating a scalable and automated ETL workflow for timely data processing
Mentored junior engineers in Spark best practices, coding standards, and troubleshooting techniques

Skills: AWS, Athena, RedShift, Python, Pyspark, Spark Sql, Sql, Cloudwatch, Lambda, Batch, StepFunctions, Datacopy, Hive, Teradata, SqlServer, Agile, Tableau, QuickSight, Pandas, shell/bash, visio, pycharm

Sr Data Engineer- SupplyChain

Walmart

02.2021 - 06.2022

Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Designed and Co-ordinated with the Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets
Designed ETL's related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities
Designed and developed ETL Pipeline to extract Shipping Expeditor's API Data into the consumption layer using python
Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
Created data pipelines and scheduled using airflow to improve the data's reliability and quality while making sure to adhere Data Governance Policies
Build ETL pipeline(POC) to extract data from external sources and stream the data on HDFS using Spark Streaming
Lead the efforts to conduct internal training sessions on GCP/Bigquery within the team
Ingested bulk data using gutil/transfer service into raw zone using - cloud storage(GCP) and analyzed the data using Bigquery
Build Pipelines and experience in extracting the data from GCP/DataDiscovery clusters and helped BI/ML teams to explore the data on DataLab and BI tools

Data Engineer 3 - HRDatalake

Walmart

11.2018 - 12.2020

SME, Designed and built pipelines to extract and process source files from workday and stored the data on HR data lake using Hive
Supported HR Business Units by providing data ready to use for data analysts, data scientists for business insights, machine learning etc
Developed ETL pipeline to extract and load data from ranger into sql server consumption layer using python
Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake
Developed T-SQL scripts(Optimization, ETL, DDL, DML, Store Procedures etc) to extract data from data lake to consumption layer
Professional experience in building ETL pipelines to extract data using Spark

Skills: Hive, Teradata, SqlServer, Informix, Hdfs, atomic, Agile, Tableau, Pandas, shell/bash, visio, powerpoint, intellij, scala, Spark, PowerBi, Workday, Json, Xml

Sr BigData Consultant - Finance

Walmart

09.2017 - 10.2018

Designed ETL Process from different sources into HDFS/Hive/Teradata using internal aorta(sqoop) framework
Migrated secure and Unsecure data from Hadoop Prod17 to dev environments
Build ETL Pipelines to extract data from multiple sources/Pos files into Finance Datalake and build a unified consumption layer for business analytics
Created data pipelines and scheduled using airflow to improve the data's reliability and quality while making sure to adhere Data Governance Policies
Build ETL pipeline(POC) to extract data from external sources and stream the data on HDFS using Spark Streaming
Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake

Skills: Hive, Teradata, SqlServer, Informix, atomic, Agile, Tableau, Pandas, shell/bash, visio, powerpoint, intellij, scala, Spark, PowerBi, Workday, Json, Xml

Application Developer

Willis Towers Watson

09.2016 - 03.2017

Company Overview: (Matrix Resources)
Played an integral role in developing Client Proposal Integrator application to validate data from multiple systems, and authored Spark SQL scripts based on functional specifications
Seamlessly imported and exported data into HDFS, HIVE using Sqoop
Actively participated in creating Hive tables and writing multiple Hive queries to invoke and run MapReduce jobs in the backend
Designed and developed access dashboard using powerbi to identify user level access on secure data residing on data lake
Developed T-SQL scripts(Optimization, ETL, DDL, DML, Store Procedures etc) to extract data from data lake to consumption layer
(Matrix Resources)

Skills: Pl Sql, Sql Server, shell/bash, visio, powerpoint

Associate Software Engineer

Beta Monks Technologies

11.2013 - 12.2014

Contributed to the development of Prepaid Card system for National Banks by methodically creating database objects such as tables, stored procedures etc
Identified, examined, and resolved issues by modifying backend code as needed, and performed unit testing of core system and business functionality to ensure adherence to quality criteria

Skills: oracle, unix, pl sql

Education

Master of Science - Information Systems

University of Maryland Baltimore County

Bachelor of Science - Computer Science Engineering

Andhra University

Skills

Big Data Technologies: Hadoop MapReduce HDFS Kafka Hive Sqoop Automic Yarn Zookeeper Spark Core

Cloud Platforms & Services: GCP GCP Cloud Storage BigQuery Cloud Dataproc Cloud Functions Cloud Pub/Sub AWS Athena Redshift Batch Step Functions Lambda Glue Crawler DataCopy EC2 EMR

Programming & Scripting: Python SQL Shell Scripting Scala PySpark PL/SQL Spark SQL Pyspark

Databases: Teradata Informix MySQL Oracle DB2 SQLite MS SQL Server

Agile Methodologies: Agile Scrum Kanban

Data Visualization & Modeling: Tableau QuickSight Power BI

Certification

GCP Certified Professional Data Engineer, Google, 2022
DataBricks Spark Certified, Databricks, 2021

Timeline

Data Engineer

Walmart

11.2023 - Current

Big Data Engineer

Revenue Analytics

07.2022 - 10.2023

Sr Data Engineer- SupplyChain

Walmart

02.2021 - 06.2022

Data Engineer 3 - HRDatalake

Walmart

11.2018 - 12.2020

Sr BigData Consultant - Finance

Walmart

09.2017 - 10.2018

Application Developer

Willis Towers Watson

09.2016 - 03.2017

Associate Software Engineer

Beta Monks Technologies

11.2013 - 12.2014

Bachelor of Science - Computer Science Engineering

Andhra University

Master of Science - Information Systems

University of Maryland Baltimore County

Venkata BORA

Summary

Overview

Work History

Data Engineer

Big Data Engineer

Sr Data Engineer- SupplyChain

Data Engineer 3 - HRDatalake

Sr BigData Consultant - Finance

Application Developer

Associate Software Engineer

Education

Master of Science - Information Systems

Bachelor of Science - Computer Science Engineering

Skills

Certification

Timeline

Data Engineer

Big Data Engineer

Sr Data Engineer- SupplyChain

Data Engineer 3 - HRDatalake

Sr BigData Consultant - Finance

Application Developer

Associate Software Engineer

Bachelor of Science - Computer Science Engineering

Master of Science - Information Systems

Similar Profiles

Benjamin BrownBenjamin Brown

Logan PowellLogan Powell

Rylee MarheineRylee Marheine

Kiranjeet KaurKiranjeet Kaur

Charmagne N. GelseyCharmagne N. Gelsey