Summary
Overview
Work History
Education
Skills
Certification
Accomplishments
Memberships
Affiliations
Research & Innovation
Timeline
Generic
BHANU PRAKASH REDDY RELLA

BHANU PRAKASH REDDY RELLA

San Jose,USA

Summary

  • Innovative Lead Data Engineer with over 10 years of experience in data engineering, AI/ML, and cloud computing, specializing in sustainable data solutions.
  • Founder of The Green AI Initiative, focused on developing energy- and water-efficient AI models through federated learning, sparse neural networks, and structured pruning.
  • Proven expertise in optimizing large-scale data pipelines across top cloud platforms, while advancing environmentally responsible AI infrastructure.
  • Regular judge at national hackathons, and author of the forthcoming book, Green Data Engineering: Sustainable Practices for Energy-Efficient AI, contributing to global thought leadership on Green Tech.

Overview

10
10
years of professional experience
7
7

Certifications

13
13

Publications

3
3

Memberships

Work History

Lead Data Engineer

Walmart Associates Inc
San Jose, USA
12.2022 - Current
  • Led a team of data engineers, providing technical guidance and mentoring, and spearheaded innovative solutions through proof of concept (POC) initiatives
  • Developed and optimized ETL pipelines from RDBMS and HDFS to GCS buckets using Scala, Python and Spark, with advanced transformations, salting, UDFs, and utility packages, improving ETL efficiency by 40%
  • Managed data storage in GCS buckets, implementing secure and efficient data access and retrieval solutions
  • Configured and managed builds using Maven and Looper.yml, streamlining and automating deployment processes
  • Automated complex workflows using Automic and Apache Airflow, managing DAGs and connectors to ensure reliable and timely execution of data pipelines
  • Optimized Dataproc clusters and Spark jobs using benchmarking for efficient resource utilization and performance, ensuring scalable data processing and reducing execution times by 20%
  • Developed Databricks notebooks integrated with Unity Catalog to centralize data governance, enabling secure data sharing through Delta Sharing and enhancing data accessibility and compliance across suppliers
  • Conducted cloud cost estimation using various strategies to track and manage project expenses effectively
  • Created UDFs in BigQuery to handle complex data transformations and implemented authorized views to ensure secure and compliant data access, improving data governance by 30%
  • Performed SQL and Spark SQL querying for data analysis and validation, optimizing query execution plans and Spark job configurations, enhancing query performance by 25%
  • Conducted code reviews to ensure high quality and adherence to best practices, mentoring team members in advanced coding techniques, improving SonarQube scores by 10% and reducing code smells
  • Created automated scripts using Spark shell to validate data, ensuring accuracy and consistency in GCP buckets, reducing manual data validation effort by 50%
  • Documented data pipelines, processes, and best practices, improving team knowledge sharing and onboarding
  • Environment: Scala, Python, GCS, GCP, Delta Lake, BigQuery, Airflow, Automic, Databricks, Jenkins, Maven, Unity Catalog

Sr. Data Engineer

Tata Consultancy Services
Irving, USA
12.2020 - 11.2022
  • Extensive experience with Azure Cloud Platform services including Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure Synapse Analytics, and Databricks
  • Designed and implemented end-to-end data migration solutions from legacy systems to Azure Data Lake Storage using Azure Data Factory, ensuring timely execution of data pipelines, resulting in a 40% improvement in data accessibility and scalability
  • Developed Databricks notebooks for ETL processes, incorporating advanced data transformation techniques and optimizing data processing with partitioning, clustering, and repartitioning strategies, reducing processing times by 30%
  • Configured Databricks resources for optimal performance, reducing execution times by 25% and resource usage by 20%
  • Utilized PySpark/Python in Databricks for complex transformations in data pipelines, leveraging capabilities for distributed computing and data processing, improving data transformation efficiency by 35%
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, cleansing, transformation, and loading, working with Python packages like pandas, numpy, and for data analysis, increasing data accuracy by 30%
  • Ensured data quality by integrating Great Expectations for data validation checks, maintaining high data integrity and reducing data errors by 40%
  • Implemented Azure Synapse external tables with partitioning and clustering for efficient storage and querying, enhancing query performance by 25%
  • Designed and implemented secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources using Databricks, resulting in a 30% reduction in data latency
  • Proficient in Snowflake Cloud DWH technologies including Snow pipes, SnowSQL, and Snowpark API
  • Managed the migration of on-premises data warehouses to Snowflake, achieving a 35% reduction in storage costs and a 50% improvement in query performance
  • Created Data Warehouses, Databases, Schemas, and Tables, writing SQL queries against Snowflake and performing bulk data loads and unloads into Snowflake tables using the COPY command, improving data handling efficiency by 30%
  • Designed and developed custom visualizations and dashboards in PowerBI, including column, line, pie, donut, area, bubble, stacked bar/column, funnel charts, scatter plot, gauge, treemap, and heatmap, leading to a 20% increase in actionable insights and decision-making efficiency
  • Environment: GIT, ADF, PySpark, Delta Lakes, Blob Storage, Snowflake, Databricks, Azure Synapse Analytics, PowerBI, SnowSQL

Data Engineer

Tata Consultancy Services
Irving, USA
10.2016 - 11.2020
  • Designed and implemented end-to-end Python ETL pipelines using Apache Spark, leveraging Spark SQL for efficient data querying and DataFrame API for complex transformations, boosting processing efficiency by 30%
  • Developed PySpark scripts to handle ETL processes, incorporating User Defined Functions (UDFs), custom Python utilities for advanced data transformation tasks, ensuring scalability and performance
  • Designed and automated data ingestion workflows using Apache Nifi and Sqoop to import data from relational databases and flat files (CSV, JSON) into HDFS, ensuring high-throughput and low-latency data transfer
  • Configured Spark resources and tuned Spark configurations (e.g., memory and executor settings), applied optimization techniques such as partitioning, bucketing, and data compression to improve Spark job performance and resource utilization
  • Created and fine-tuned complex SQL queries for ETL processes, utilizing advanced SQL concepts such as window functions, common table expressions (CTEs), and indexing for efficient data manipulation and retrieval
  • Configured and managed Apache Hive for batch processing and querying large datasets stored in HDFS, creating external tables, and using HiveQL for efficient batch data processing and analysis
  • Deployed Apache HBase for low-latency read/write operations on large datasets, integrating with Spark for real-time analytics and employing HBase concepts like row key design, column families, and time-to-live (TTL)
  • Developed real-time data processing pipelines using Apache Spark Streaming and Apache Flink, performing complex event processing and aggregations on streaming data, ensuring low-latency and high-throughput processing
  • Automated workflows using Apache Oozie for scheduling and managing complex data pipeline workflows, ensuring reliable and timely execution
  • Utilized Prometheus for collecting and monitoring metrics from various pipeline components, and created dashboards in Grafana for real-time monitoring, alerting, and visualization of critical performance metrics and issues
  • Environment: Apache Nifi, Sqoop, Shell Scripts, Apache Spark, PySpark, SQL, Apache Hive, Apache HBase, Deequ, Python, Apache Spark Streaming, Apache Flink, Apache Oozie, Prometheus, Grafana, Power BI

Intern Data Analyst

Coign Technologies
, India
08.2015 - 06.2016
  • Involved in designing/developing logical & physical data models using Erwin data modelling tool, focusing on star and snowflake schemas for optimized data warehousing solutions
  • Implemented data migration projects from various sources like Oracle Database, XML, Flat Files, CSV files and loaded to target warehouse using ETL
  • Created complex mappings in Informatica Power Center using Aggregate, Expression, Filter, Sequence etc
  • Extensively worked in SQL, PL/SQL, Query performance tuning, DDL scripts, database objects like Tables, Views, Indexes, Synonyms and Sequences
  • Worked with various process improvements, normalization, de-normalization, data extraction, data cleansing, and data manipulation
  • Designed and developed dashboards using best practices and advanced filtering techniques to dynamically filter and serve as the source for the dashboard using SAP Business Objects
  • Environment: Informatica Power Center, SAP Business Objects, SQL, SSMS, Erwin

Education

Master of Science - Management Information Systems

The University of Memphis
Memphis, TN
12.2020

Bachelor of Technology - Electrical and Electronics Engineering

Jawaharlal Nehru Technological University
Hyderabad, India
05.2016

Skills

  • SQL
  • Java
  • Python
  • R language
  • Unix
  • Scala
  • Pyspark
  • Hadoop
  • HDFS
  • MapReduce
  • Hive
  • Yarn
  • Oozie
  • Sqoop
  • Spark
  • GCP
  • AWS
  • Azure
  • Apache Airflow
  • Snowflake
  • SSIS
  • Informatica Power center
  • Abinitio
  • Data Stage
  • Tableau Prep
  • Alteryx
  • BigQuery
  • Looker
  • Tableau
  • SAP Business Objects
  • Power BI
  • Tibco Spotfire
  • MS SQL Server
  • Teradata
  • Oracle
  • HBase
  • Postgre SQL
  • DB2
  • Visio
  • Erwin
  • SAS
  • MS-Excel
  • Waterfall
  • Agile
  • Scaled Agile
  • Azure Devops
  • Regression analysis
  • Bayesian Method
  • Decision Tree
  • Random Forests
  • Support Vector Machine
  • Neural Network
  • Sentiment Analysis
  • K-Means Clustering
  • KNN
  • Ensemble Method
  • NLP
  • GIT
  • Github
  • SVN
  • SharePoint
  • QuickBase
  • Confluence
  • Jira
  • Bitbucket
  • JFrog

Certification

  • Data Analytics in Technology, University of Memphis, 2020
  • Alteryx Certified Professional, 2020
  • Aws Cloud Practitioner, 2020
  • Azure Data Engineer Associate, 2022
  • Data Bricks data engineer certificate, 2022
  • Oracle Certified Professional, Java SE 6 Programmer, 2015

Accomplishments

  • Global Recognition Award

Memberships

  • IEEE Member
  • International Association of Eggineers (IAENG) Member
  • Soft Computing Research Society (SCRS) Fellow

Affiliations

  • 7 IEEE Session Chairs
  • 2 Speaker
  • 4 Hackathon Judges

Research & Innovation

• The Green AI Initiative: Founder and technical lead of an initiative driving sustainable AI practices, with a focus on reducing water and compute footprints in large-scale model training and inference.
• Key Research Areas: Federated Learning, Sparse Neural Networks, Structured Pruning, AI Water Footprint Optimization, and Energy-Aware Model Design.
• Publications & Thought Leadership: Authoring a 200-page international book on Green Data Engineering, spotlighting energy-efficient AI pipelines and environmental impacts.

Timeline

Lead Data Engineer

Walmart Associates Inc
12.2022 - Current

Sr. Data Engineer

Tata Consultancy Services
12.2020 - 11.2022

Data Engineer

Tata Consultancy Services
10.2016 - 11.2020

Intern Data Analyst

Coign Technologies
08.2015 - 06.2016

Master of Science - Management Information Systems

The University of Memphis

Bachelor of Technology - Electrical and Electronics Engineering

Jawaharlal Nehru Technological University
BHANU PRAKASH REDDY RELLA