Summary

Overview

Work History

Education

Skills

Certification

Accomplishments

Memberships

Affiliations

Research & Innovation

Timeline

BHANU PRAKASH REDDY RELLA

San Jose,USA

Summary

Innovative Lead Data Engineer with over 10 years of experience in data engineering, AI/ML, and cloud computing, specializing in sustainable data solutions.
Founder of The Green AI Initiative, focused on developing energy- and water-efficient AI models through federated learning, sparse neural networks, and structured pruning.
Proven expertise in optimizing large-scale data pipelines across top cloud platforms, while advancing environmentally responsible AI infrastructure.
Regular judge at national hackathons, and author of the forthcoming book, Green Data Engineering: Sustainable Practices for Energy-Efficient AI, contributing to global thought leadership on Green Tech.

Overview

years of professional experience

Certifications

Publications

Memberships

Work History

Lead Data Engineer

Walmart Associates Inc

San Jose, USA

12.2022 - Current

Led a team of data engineers, providing technical guidance and mentoring, and spearheaded innovative solutions through proof of concept (POC) initiatives
Developed and optimized ETL pipelines from RDBMS and HDFS to GCS buckets using Scala, Python and Spark, with advanced transformations, salting, UDFs, and utility packages, improving ETL efficiency by 40%
Managed data storage in GCS buckets, implementing secure and efficient data access and retrieval solutions
Configured and managed builds using Maven and Looper.yml, streamlining and automating deployment processes
Automated complex workflows using Automic and Apache Airflow, managing DAGs and connectors to ensure reliable and timely execution of data pipelines
Optimized Dataproc clusters and Spark jobs using benchmarking for efficient resource utilization and performance, ensuring scalable data processing and reducing execution times by 20%
Developed Databricks notebooks integrated with Unity Catalog to centralize data governance, enabling secure data sharing through Delta Sharing and enhancing data accessibility and compliance across suppliers
Conducted cloud cost estimation using various strategies to track and manage project expenses effectively
Created UDFs in BigQuery to handle complex data transformations and implemented authorized views to ensure secure and compliant data access, improving data governance by 30%
Performed SQL and Spark SQL querying for data analysis and validation, optimizing query execution plans and Spark job configurations, enhancing query performance by 25%
Conducted code reviews to ensure high quality and adherence to best practices, mentoring team members in advanced coding techniques, improving SonarQube scores by 10% and reducing code smells
Created automated scripts using Spark shell to validate data, ensuring accuracy and consistency in GCP buckets, reducing manual data validation effort by 50%
Documented data pipelines, processes, and best practices, improving team knowledge sharing and onboarding
Environment: Scala, Python, GCS, GCP, Delta Lake, BigQuery, Airflow, Automic, Databricks, Jenkins, Maven, Unity Catalog

Sr. Data Engineer

Tata Consultancy Services

Irving, USA

12.2020 - 11.2022

Extensive experience with Azure Cloud Platform services including Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure Synapse Analytics, and Databricks
Designed and implemented end-to-end data migration solutions from legacy systems to Azure Data Lake Storage using Azure Data Factory, ensuring timely execution of data pipelines, resulting in a 40% improvement in data accessibility and scalability
Developed Databricks notebooks for ETL processes, incorporating advanced data transformation techniques and optimizing data processing with partitioning, clustering, and repartitioning strategies, reducing processing times by 30%
Configured Databricks resources for optimal performance, reducing execution times by 25% and resource usage by 20%
Utilized PySpark/Python in Databricks for complex transformations in data pipelines, leveraging capabilities for distributed computing and data processing, improving data transformation efficiency by 35%
Developed Spark applications using PySpark and Spark-SQL for data extraction, cleansing, transformation, and loading, working with Python packages like pandas, numpy, and for data analysis, increasing data accuracy by 30%
Ensured data quality by integrating Great Expectations for data validation checks, maintaining high data integrity and reducing data errors by 40%
Implemented Azure Synapse external tables with partitioning and clustering for efficient storage and querying, enhancing query performance by 25%
Designed and implemented secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources using Databricks, resulting in a 30% reduction in data latency
Proficient in Snowflake Cloud DWH technologies including Snow pipes, SnowSQL, and Snowpark API
Managed the migration of on-premises data warehouses to Snowflake, achieving a 35% reduction in storage costs and a 50% improvement in query performance
Created Data Warehouses, Databases, Schemas, and Tables, writing SQL queries against Snowflake and performing bulk data loads and unloads into Snowflake tables using the COPY command, improving data handling efficiency by 30%
Designed and developed custom visualizations and dashboards in PowerBI, including column, line, pie, donut, area, bubble, stacked bar/column, funnel charts, scatter plot, gauge, treemap, and heatmap, leading to a 20% increase in actionable insights and decision-making efficiency
Environment: GIT, ADF, PySpark, Delta Lakes, Blob Storage, Snowflake, Databricks, Azure Synapse Analytics, PowerBI, SnowSQL

Data Engineer

Tata Consultancy Services

Irving, USA

10.2016 - 11.2020

Designed and implemented end-to-end Python ETL pipelines using Apache Spark, leveraging Spark SQL for efficient data querying and DataFrame API for complex transformations, boosting processing efficiency by 30%
Developed PySpark scripts to handle ETL processes, incorporating User Defined Functions (UDFs), custom Python utilities for advanced data transformation tasks, ensuring scalability and performance
Designed and automated data ingestion workflows using Apache Nifi and Sqoop to import data from relational databases and flat files (CSV, JSON) into HDFS, ensuring high-throughput and low-latency data transfer
Configured Spark resources and tuned Spark configurations (e.g., memory and executor settings), applied optimization techniques such as partitioning, bucketing, and data compression to improve Spark job performance and resource utilization
Created and fine-tuned complex SQL queries for ETL processes, utilizing advanced SQL concepts such as window functions, common table expressions (CTEs), and indexing for efficient data manipulation and retrieval
Configured and managed Apache Hive for batch processing and querying large datasets stored in HDFS, creating external tables, and using HiveQL for efficient batch data processing and analysis
Deployed Apache HBase for low-latency read/write operations on large datasets, integrating with Spark for real-time analytics and employing HBase concepts like row key design, column families, and time-to-live (TTL)
Developed real-time data processing pipelines using Apache Spark Streaming and Apache Flink, performing complex event processing and aggregations on streaming data, ensuring low-latency and high-throughput processing
Automated workflows using Apache Oozie for scheduling and managing complex data pipeline workflows, ensuring reliable and timely execution
Utilized Prometheus for collecting and monitoring metrics from various pipeline components, and created dashboards in Grafana for real-time monitoring, alerting, and visualization of critical performance metrics and issues
Environment: Apache Nifi, Sqoop, Shell Scripts, Apache Spark, PySpark, SQL, Apache Hive, Apache HBase, Deequ, Python, Apache Spark Streaming, Apache Flink, Apache Oozie, Prometheus, Grafana, Power BI

Intern Data Analyst

Coign Technologies

, India

08.2015 - 06.2016

Involved in designing/developing logical & physical data models using Erwin data modelling tool, focusing on star and snowflake schemas for optimized data warehousing solutions
Implemented data migration projects from various sources like Oracle Database, XML, Flat Files, CSV files and loaded to target warehouse using ETL
Created complex mappings in Informatica Power Center using Aggregate, Expression, Filter, Sequence etc
Extensively worked in SQL, PL/SQL, Query performance tuning, DDL scripts, database objects like Tables, Views, Indexes, Synonyms and Sequences
Worked with various process improvements, normalization, de-normalization, data extraction, data cleansing, and data manipulation
Designed and developed dashboards using best practices and advanced filtering techniques to dynamically filter and serve as the source for the dashboard using SAP Business Objects
Environment: Informatica Power Center, SAP Business Objects, SQL, SSMS, Erwin

Education

Master of Science - Management Information Systems

The University of Memphis

Memphis, TN

12.2020

Bachelor of Technology - Electrical and Electronics Engineering

Jawaharlal Nehru Technological University

Hyderabad, India

05.2016

Skills

SQL
Java
Python
R language
Unix
Scala
Pyspark
Hadoop
HDFS
MapReduce
Hive
Yarn
Oozie
Sqoop
Spark
GCP
AWS
Azure
Apache Airflow
Snowflake
SSIS
Informatica Power center
Abinitio
Data Stage
Tableau Prep
Alteryx
BigQuery
Looker
Tableau
SAP Business Objects
Power BI
Tibco Spotfire
MS SQL Server

Teradata
Oracle
HBase
Postgre SQL
DB2
Visio
Erwin
SAS
MS-Excel
Waterfall
Agile
Scaled Agile
Azure Devops
Regression analysis
Bayesian Method
Decision Tree
Random Forests
Support Vector Machine
Neural Network
Sentiment Analysis
K-Means Clustering
KNN
Ensemble Method
NLP
GIT
Github
SVN
SharePoint
QuickBase
Confluence
Jira
Bitbucket
JFrog

Certification

Data Analytics in Technology, University of Memphis, 2020
Alteryx Certified Professional, 2020
Aws Cloud Practitioner, 2020
Azure Data Engineer Associate, 2022
Data Bricks data engineer certificate, 2022
Oracle Certified Professional, Java SE 6 Programmer, 2015

Accomplishments

Global Recognition Award

Memberships

IEEE Member
International Association of Eggineers (IAENG) Member
Soft Computing Research Society (SCRS) Fellow

Affiliations

7 IEEE Session Chairs
2 Speaker
4 Hackathon Judges

Research & Innovation

• The Green AI Initiative: Founder and technical lead of an initiative driving sustainable AI practices, with a focus on reducing water and compute footprints in large-scale model training and inference.
• Key Research Areas: Federated Learning, Sparse Neural Networks, Structured Pruning, AI Water Footprint Optimization, and Energy-Aware Model Design.
• Publications & Thought Leadership: Authoring a 200-page international book on Green Data Engineering, spotlighting energy-efficient AI pipelines and environmental impacts.

Timeline

Lead Data Engineer

Walmart Associates Inc

12.2022 - Current

Sr. Data Engineer

Tata Consultancy Services

12.2020 - 11.2022

Data Engineer

Tata Consultancy Services

10.2016 - 11.2020

Intern Data Analyst

Coign Technologies

08.2015 - 06.2016

Master of Science - Management Information Systems

The University of Memphis

Bachelor of Technology - Electrical and Electronics Engineering

Jawaharlal Nehru Technological University