Overview

Summary

Skills

Work History

Education

Hi, I’m

Balaji Digala

Irving,TX

Overview

years of professional experience

Certification

Summary

Experienced Data Engineer with 6 years in developing, optimizing, and automating complex ETL/ELT pipelines leveraging AWS, Azure, Spark, Hadoop, and Snowflake. Skilled in Python and Scala for advanced data transformations and analysis. Certified AWS Associate Solutions Architect with hands-on experience in S3, DynamoDB, Glue, EMR, ECS, IAM, EC2, and Lambda. Proficient in data warehousing with Snowflake, Redshift, and relational (MySQL, PostgreSQL) and NoSQL (MongoDB, DynamoDB, HBase) databases. Expertise in ETL tools such as Talend and Informatica, as well as real-time data streaming and processing with Apache Kafka and Spark Streaming. Managed deploying containerized applications using Docker and Kubernetes, and managing infrastructure as code with Terraform. Forte in project management and SDLC methodologies, utilizing JIRA, GIT, Jenkins for CI/CD, Agile practices, and innovation strategies.

Skills

Python, SQL, Scala, Linux
AWS, Azure
Apache Spark, Airflow
Spark SQL, PySpark
Apache Kafka, Hadoop
Snowflake, Databricks

MySQL, SQL Server
PostgreSQL, Teradata
MongoDB, HBase
Cassandra, DynamoDB
Informatica, Talend
CSV, JSON, Parquet, XML

Work History

American Airlines

Data Engineer

01.2024 - Current

Job overview

Developed real-time data processing enterprise system using Apache Kafka and Apache Spark in Scala, streamlining analysis of streaming data from diverse external sources.
Implemented Spark SQL scripts in Databricks and Scala for batch processing jobs to extract, transform, and aggregate data from multiple file formats, reducing processing time by 30%.
Engineered data pipeline integrations, ETL processes, and comprehensive ingestion from source systems to Azure Blob Storage, Azure Data Lake, SQL, and Azure Synapse Analytics using Azure Data Factory, T-SQL, Spark SQL, and U-SQL.
Performed DAG (directed acyclic graphs) lineage tracking using Airflow for transparency and traceability in data transformations across Azure, Databricks, and Snowflake.
Designed relational database system and was involved in logical modeling using dimensional modeling techniques such as Star schema and Snowflake schema.
Configured Docker containers to streamline development workflows, Git for version control, managed tasks and bugs with Jira, and employed Agile (SCRUM) methodologies for efficient software development, leading to enhanced project delivery and governance by 30%.

Environment: Apache Spark, Spark SQL, Databricks, Scala, Map Reduce, Azure, Tableau, Power BI, Python, Apache Airflow, Apache Kafka, Docker, Azure, Hive, Git, Jira, SQL, MongoDB, Agile.

Paychex

Data Engineer

02.2022 - 12.2022

Job overview

Devised Scala scripts and UDF's using data frames and RDD in Spark for data cleansing, aggregation, and writing back into S3 bucket, resulting in 50% reduction in processing time.
Accelerated data processing from Hive tables using PySpark, Spark SQL, and MapReduce on HDFS to cleanse heterogeneous data and enhance data retrieval speed for analytical insights.
Orchestrated robust ETL pipelines using Azure Data Factory, streamlining data flow from Data Lake to multiple databases using stored procedures, data flows, and Azure Functions.
Configured Azure Polybase for efficient data extraction from Azure Data Lake, streamlining data workflows, and improving data integration speed by 20%.
Generated interactive reports and dashboards using Power BI and Tableau, improving business decision-making by providing real-time insights into POS and operational data.

Environment: Scala, Apache Spark, Python, S3, Hive, PySpark, Spark SQL, RDD, MapReduce, HDFS, Azure Data Factory, Azure Data Lake, Azure Functions, Hadoop, Kafka, Apache Airflow.

Western Union

Data Engineer

03.2020 - 01.2022

Job overview

Developed Python-based API (RESTful Web Service) to track revenue and perform analysis.
Leveraged AWS to build scalable, cloud-based data solutions, utilizing services like EC2, S3, Redshift, and EMR (Elastic Map Reduce) to manage and process data efficiently.
Integrated AWS Glue for data cataloging and Informatica for ETL jobs, ensuring data quality.
Modeled data warehouses and marts for effective data management using Kimball methodology, Facts, Dimensions, SCDs, Surrogate Keys, Star schema, and Snowflake schema.
Orchestrated end-to-end data pipelines on Snowflake, integrating batch and streaming data for real-time analytics with Snowpipe and Streams, reducing data processing time by 40%.
Implemented Amazon Elastic Kubernetes Service (EKS) scheduler to automate application deployment in cloud using Docker automation techniques.
Executed unit testing, validations, and debugging to ensure reliable data solutions.

Environment: Python, Amazon Elastic Kubernetes Service, Informatica, ETL, Power BI, Tableau, AWS, Snowflake, Python, RESTFul, Docker, AWS Glue Data Catalog, MongoDB, SQL, AWS ECS.

Web Affinity Technologies Pvt Ltd

Data Engineer

08.2017 - 02.2020

Job overview

Deployed data pipelines using AWS Kinesis for real-time streaming, S3 for raw data storage, and Lambda for serverless processing. Leveraged Redshift for structured data warehousing and fast queries, and DynamoDB for scalable NoSQL storage and retrieval.
Designed and reviewed processes to optimize ETL pipeline architecture and codebase using Spark and Hive (including daily runs, error handling, and logging) to useful metadata.
Employed Hadoop for distributed storage and processing of large data sets, improving efficiency and scalability of data ingestion and transformation workflows.
Pioneered PIG scripts for analysis of semi-structured data. Used Pig as an ETL tool for transformations, event joins, filters, and pre-aggregations before ingesting data onto HDFS.
Proficient in SQL query optimization, indexing, and tuning. Created stored procedures, triggers, functions, and views for real-time analytics using Oracle, SQL Server, and MySQL.
Created Tableau dashboards and reports for data visualization, reporting, and analysis. Employed Power Query in Power BI to pivot and unpivot data model for data cleansing.
Utilized Terraform to automate provisioning and management of cloud infrastructure, ensuring consistency, repeatability, and scalability across AWS and Azure environments.

Environment: AWS Kinesis, AWS S3, AWS EMR, AWS Lambda, Redshift, DynamoDB, Spark, Windows, Hive, Hadoop, Pig, Oracle, Microsoft SQL Server, MySQL, Tableau, Power BI, Terraform, Agile.

Education

Bradley University
Peoria, IL

Master of Science from Computer Science

05.2024

Similar Profiles

Crystal Michelle BarayCrystal Michelle Baray
MSR Bank Secrecy Act - Compliance at American Airlines, American Airlines HDQMSR Bank Secrecy Act - Compliance at American Airlines, American Airlines HDQ
Sue RobinsonSue Robinson
Sr Specialist, HRIS Business Process at American Airlines, American Airlines HDQSr Specialist, HRIS Business Process at American Airlines, American Airlines HDQ
Jilleian K. Sessions-StackhouseJilleian K. Sessions-Stackhouse
Human Resources Sr. Specialist, Policy at American Airlines, American Airlines HDQHuman Resources Sr. Specialist, Policy at American Airlines, American Airlines HDQ
CRYSTAL FENNELCRYSTAL FENNEL
Airline-customer-service-agent/ Airline-Ramp Agent at US Airways American Airlines Piedmont AirlinesAirline-customer-service-agent/ Airline-Ramp Agent at US Airways American Airlines Piedmont Airlines
Alan ThomasAlan Thomas
Access Service Representative at Carrollton Regional Medical CenterAccess Service Representative at Carrollton Regional Medical Center

CREATE PROFILE

Overview

Summary

Skills

Work History

American Airlines

Job overview

Paychex

Job overview

Western Union

Job overview

Web Affinity Technologies Pvt Ltd

Job overview

Education

Bradley UniversityPeoria, IL

Similar Profiles

Crystal Michelle BarayCrystal Michelle Baray

Sue RobinsonSue Robinson

Jilleian K. Sessions-StackhouseJilleian K. Sessions-Stackhouse

CRYSTAL FENNELCRYSTAL FENNEL

Alan ThomasAlan Thomas

Bradley University
Peoria, IL