Summary
Overview
Work History
Education
Skills
Timeline
Generic

BALAJI SAI DIGALA

Cincinnati

Summary

Experienced Data Engineer with 6+ years in developing, optimizing, and automating complex ETL/ELT pipelines leveraging AWS, Azure, Spark, and Snowflake. Skilled in Python and SQL for advanced data transformations and analysis. Certified AWS Associate Solutions Architect with hands-on experience. Proficient in data warehousing with Snowflake, Synapse, and relational and NoSQL databases. Expertise in ETL tools such as DBT and Informatica, as well as real-time data streaming and processing with Apache Kafka and Spark Streaming. Managed deploying containerized applications using Docker and Kubernetes, and managing infrastructure as code with Terraform. Forte in project management and SDLC methodologies, utilizing JIRA, GIT, Jenkins for CI/CD, Agile practices, and innovation strategies.

Overview

8
8
years of professional experience

Work History

Data Engineer

Humana Inc
01.2025 - Current

• Built and optimized scalable ETL pipelines for a healthcare project leveraging PySpark in Databricks, TSQL, and ADF to ingest and transform data from different data sources for analytics and reporting.

• Managed data storage and retrieval using Azure Data Lake Storage (ADLS) and SQL Server Management Studio (SSMS).

• Leveraged Delta Lake on ADLS Gen2 to store and sync data from Azure SQL Hyperscale into Databricks.

• Designed and maintained data repositories in Azure Synapse Analytics for master and UI tables, improving data accessibility and selfservice reporting for business users.

• Contributed to the design and management of endtoend Medicaid claims submission workflows using Azure Data Factory and Azure Databricks, improving processing efficiency by 30% and ensuring 100% compliance with 10+ state regulations.

• Processed 5 million+ Medicaid claims daily, coordinating with crossfunctional teams to submit critical healthcare data to Edifecs and CMS.

• Implemented robust data governance, security, and access controls across ADLS, Synapse, Snowflake, and Databricks environments, ensuring HIPAA and CMS regulatory compliance.

• Utilized AI-powered coding assistants such as Mosaic AI and Codium AI within Databricks notebooks and GIT to accelerate PySpark scripting, automate repetitive coding tasks, and enhance code quality, resulting in faster delivery cycles and reduced manual workload.

Environment: Azure, Databricks, PySpark, Delta Lake, Azure Data Factory, Azure Data Lake Gen2, Azure Synapse Analytics, Azure SQL (Hyperscale), SQL, Snowflake, Git, Mosaic AI, Codium AI, HIPAA/CMS Compliance


DATA ENGINEER

CapitalOne
07.2024 - 01.2025
  • Developed real-time data processing enterprise system using Apache Kafka and Apache Spark in Scala, streamlining analysis of streaming data from diverse external sources.
  • Built a centralized analytics warehouse in Snowflake, developing complex SQL transformations to produce curated, analytics-ready datasets for BI, ad-hoc analysis, and downstream applications.
  • Implemented modular, version-controlled SQL transformation pipelines (dbt-style) within Snowflake, enforcing data quality and reusability while reducing batch processing latency by 30%.
  • Developed Python-based Streamlit applications integrated with Snowflake SQL, delivering interactive data exploration and lightweight decision-support tools for internal stakeholders.
  • Designed relational database system and was involved in logical modeling using dimensional modeling techniques such as Star schema and Snowflake schema.
  • Configured Docker containers to streamline development workflows, Git for version control, managed tasks and bugs with Jira, and employed Agile (SCRUM) methodologies for efficient software development, leading to enhanced project delivery and governance by 30%.
  • Environment: Apache Spark, Spark SQL, Databricks, Scala, Map Reduce, Azure, Tableau, Power BI, Python, Apache Airflow, Apache Kafka, Docker, Hive, Git, Jira, SQL, MongoDB, Agile.

DATA ENGINEER

BizViz Technologies Pvt Ltd
03.2020 - 12.2022
  • Devised Scala scripts and UDF's using data frames and RDD in Spark for data cleansing, aggregation, and writing back into S3 bucket, resulting in 50% reduction in processing time.
  • Accelerated data processing from Hive tables using PySpark, Spark SQL, and MapReduce on HDFS to cleanse heterogeneous data and enhance data retrieval speed for analytical insights.
  • Orchestrated robust ETL pipelines using Azure Data Factory, streamlining data flow from Data Lake to multiple databases using stored procedures, data flows, and Azure Functions.
  • Configured Azure Synapse Analytics for efficient data extraction from Azure Data Lake, streamlining data workflows, and improving data integration speed by 20%.
  • Generated interactive reports and dashboards using Power BI and Tableau, improving business decision-making by providing real-time insights into POS and operational data.
  • Collaborated with Analysts and Data Scientists to optimize data pipelines for business needs.
  • Developed Python-based API (RESTful Web Service) to track revenue and perform analysis.
  • Leveraged AWS to build scalable, cloud-based data solutions, utilizing services like EC2, S3, Redshift, and EMR (Elastic Map Reduce) to manage and process data efficiently.
  • Integrated AWS Glue for data cataloging and Informatica for ETL jobs, ensuring data quality.
  • Modeled data warehouses and marts for effective data management using Kimball methodology, Facts, Dimensions, SCDs, Surrogate Keys, Star schema, and Snowflake schema.
  • Orchestrated end-to-end data pipelines on Snowflake, integrating batch and streaming data for real-time analytics with Snowpipe and Streams, reducing data processing time by 40%.
  • Implemented DBT-based data architectures, managing models across multiple projects, improving data processing efficiency, and enabling real-time analytics.
  • Implemented Amazon Elastic Kubernetes Service (EKS) scheduler to automate application deployment in cloud using Docker automation techniques.
  • Executed unit testing, validations, and debugging to ensure reliable data solutions.
  • Environment: Python, Amazon Elastic Kubernetes Service, Informatica, DBT, ETL, Power BI, Tableau, AWS, Snowflake, RESTful, Docker, AWS Glue Data Catalog, MongoDB, SQL, AWS ECS.

DATA ENGINEER

Technosoft Solutions
08.2017 - 02.2020
  • Deployed data pipelines using AWS Kinesis for real-time streaming, S3 for raw data storage, and Lambda for serverless processing. Leveraged Redshift for structured data warehousing and fast queries, and DynamoDB for scalable NoSQL storage and retrieval.
  • Designed and reviewed processes to optimize ETL pipeline architecture and codebase using Spark and Hive (including daily runs, error handling, and logging) to useful metadata.
  • Employed Hadoop for distributed storage and processing of large data sets, improving efficiency and scalability of data ingestion and transformation workflows.
  • Pioneered PIG scripts for analysis of semi-structured data. Used Pig as an ETL tool for transformations, event joins, filters, and pre-aggregations before ingesting data onto HDFS.
  • Proficient in SQL query optimization, indexing, and tuning. Created stored procedures, triggers, functions, and views for real-time analytics using Oracle, SQL Server, and MySQL.
  • Employed Power Query in Power BI to pivot and unpivot data model for data cleansing.
  • Utilized Terraform to automate provisioning and management of cloud infrastructure, ensuring consistency, repeatability, and scalability across AWS and Azure environments.
  • Environment: AWS Kinesis, AWS S3, AWS EMR, AWS Lambda, Redshift, DynamoDB, Spark, Agile, Windows, Hive, Hadoop, Pig, Oracle, SQL Server, MySQL, PL/SQL, Tableau, Power BI, Terraform.

Education

Master of Science - Computer Science

Bradley University
Peoria, IL
05.2024

Bachelor of Technology - undefined

National Institute of Technology Karnataka
India
01.2017

Skills

  • Python, SQL, Linux
  • AWS, Azure
  • ADF, Airflow
  • Streamlit, Pandas
  • Spark SQL, PySpark
  • Apache Kafka, Hadoop
  • Snowflake, Databricks
  • MySQL, SQL Server
  • Datastage, DBT, SSIS
  • Docker, Kubernetes

Timeline

Data Engineer

Humana Inc
01.2025 - Current

DATA ENGINEER

CapitalOne
07.2024 - 01.2025

DATA ENGINEER

BizViz Technologies Pvt Ltd
03.2020 - 12.2022

DATA ENGINEER

Technosoft Solutions
08.2017 - 02.2020

Bachelor of Technology - undefined

National Institute of Technology Karnataka

Master of Science - Computer Science

Bradley University