Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Hareen Edla

Durham,NC

Summary

Results-driven data engineering professional with solid foundation in designing and maintaining scalable data systems. Expertise in developing efficient ETL processes and ensuring data accuracy, contributing to impactful business insights. Known for strong collaborative skills and ability to adapt to dynamic project requirements, delivering reliable and timely solutions.

Overview

6
6
years of professional experience
2
2

Certifications

Work History

AZURE DATA ENGINEER

ALAMAR BIOSCIENCES, INC
01.2025 - Current
  • Designed and managed Azure Databricks Spark clusters (job and all-purpose) to support scalable data engineering workloads, reducing cluster downtime by 25% through optimized auto-scaling and job orchestration.
  • Integrated Apache Flink with Azure Event Hubs and Kafka to process high-throughput, low-latency event streams, ensuring delivery of insights within seconds of event occurrence.
  • Developed robust ETL pipelines in Azure Data Factory (ADF) to ingest and transform 1B+ records daily from Oracle, SQL Server, Hive, and Snowflake into Azure Synapse Analytics and ADLS Gen2, improving pipeline reliability by 30%.
  • Built Snowflake connections in ADF and Databricks to extract financial and operational datasets, applying transformation and loading logic to support both historical analysis and real-time reporting use cases.
  • Automated data ingestion workflows from structured and semi-structured sources (CSV, JSON, Parquet) into Delta Lake and Snowflake, implementing schema enforcement and versioning to reduce data errors by 40%.
  • Tuned Spark jobs in Databricks using Z-Ordering, file compaction, and optimized partitioning, achieving 40% faster query performance for downstream BI users.
  • Implemented data quality checks using Great Expectations and custom PySpark validators, increasing pipeline data integrity and reducing manual validation by 50%.
  • Delivered real-time streaming pipelines using Structured Streaming, Event Hubs, and Databricks, enabling near- instant data availability in Snowflake for regulatory and operational dashboards.
  • Configured Unity Catalog and Snowflake RBAC policies for centralized data governance and access control, enhancing audit readiness and reducing compliance risk.
  • Utilized Fabric Data Pipelines to orchestrate ingestion and transformation from diverse sources into Lakehouses, enabling real-time insights for finance and operations teams.
  • Configured Databricks job-level monitoring with alert rules and webhook-based notifications, reducing time-to- detect failures by 40%.
  • Implemented full CI/CD automation via Azure DevOps, ensuring consistent deployment of pipelines, Spark jobs and also used Terraform to provision infrastructure across environments.

DATA CONSULTANT

SAINT LOUIS UNIVERSITY
09.2023 - 11.2024
  • Developed and maintained ETL pipelines using Apache Airflow and Azure Databricks (PySpark) to support university-wide reporting on enrollment, admissions, and academic performance trends.
  • Enabled real-time monitoring of student portal activity and IT systems by integrating Kafka with Spark Structured Streaming and OpenSearch, assisting IT and academic support teams with faster issue resolution.
  • Built Spark jobs to clean and normalize data from multiple university departments (Admissions, Registrar, Financial Aid), streamlining inter-departmental reporting and data sharing.
  • Collaborated with academic and IT teams to design data validation and lineage tracking mechanisms, reducing reporting errors and increasing stakeholder trust in analytics outputs by 30%.

AWS DATA ENGINEER

ACCENTURE
08.2022 - 07.2023
  • Designed and optimized real-time streaming pipelines using Apache Spark Structured Streaming, processing transactional data batches for fraud detection models with 30% faster throughput and enhanced error handling.
  • Implemented Spark performance tuning strategies using broadcast joins, partitioning, caching, and in-memory processing, reducing ingestion latency by 40% across large datasets.
  • Led the migration of an on-premises Oracle Data Warehouse to Amazon Redshift, reducing infrastructure costs and improving analytical query performance by 50%.
  • Developed scalable ETL pipelines leveraging Informatica and custom Python scripts to load structured and semi-structured data into Redshift and RDS for analytics consumption.
  • Configured IAM roles and RBAC policies to enforce secure access to AWS services, aligning with organizational compliance and governance standards.
  • Used Sqoop and Hive to ingest, transform, and analyze large volumes of data; implemented Hive UDFs and optimized HQL queries for ad hoc analytics.
  • Built modular and reusable DBT models for transforming raw staging data into analytics-ready datasets, applying data tests, documentation, and version control to maintain accuracy and trust in data assets.
  • Automated DBT model execution using scheduled jobs and CI/CD pipelines, improving development efficiency and reducing manual deployment time by 40%.
  • Supported production data issues by performing root cause analysis, managing ticket resolution workflows, and delivering ad hoc impact analysis on claims, provider, and network datasets.
  • Created and maintained technical documentation for Spark pipelines, terraform scripts, and data workflows to support ongoing development and knowledge transfer.

JUNIOR DATA ENGINEER

ACCENTURE
01.2020 - 05.2022
  • Involved in analysis, specification, design, and implementation and testing phases of Software Development Life Cycle (SDLC) and used agile methodology for developing application.
  • Collaborated with the data engineering team to build batch and streaming ETL pipelines that ingested, transformed, and loaded banking transactions, customer profiles, and account data into secure enterprise data warehouses.
  • Developed Python-based ETL scripts to process high-volume financial datasets, applying data cleansing, validation, masking, and enrichment to support downstream analytics and reporting.
  • Designed data integration workflows using Python and SQL to pull data from Oracle, NoSQL, and flat-file sources, ensuring schema consistency and referential integrity across systems.
  • Implemented data quality checks and automated alerts for anomalies in customer onboarding and transaction records, reducing manual data review time by 30%.
  • Optimized complex SQL queries and stored procedures used in reconciliation processes and daily settlement reports, improving performance by 25% on average.
  • Ingested multi-format financial data (CSV, XML, JSON) into centralized storage using scheduled jobs, enabling consistent data availability for audit, regulatory, and fraud analytics teams.
  • Built and maintained automated data validation test cases using Python and unit test libraries, supporting continuous data quality in production pipelines.
  • Assisted in configuring data governance controls aligned with internal compliance and audit standards (PCI DSS, GDPR), including PII redaction and access controls.
  • Developed and documented data lineage mappings between operational systems and reporting layers using metadata repository tools.

Education

Master of Science - Information Systems

SAINT LOUIS UNIVERSITY
St Louis, MO
01.2025

Bachelor of Science - Computer Science

SAINT MARY’S COLLEGE
Hyderabad
01.2022

Skills

  • Cloud Technologies: Azure Data Engineering – Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure SQL, Azure Data Lake Storage, Azure Functions, Logic Apps, Azure Key Vault, AWS Data Engineering – Redshift, EMR, S3, Glue, Lambda, Step Functions, Databricks
  • Big Data Eco System: HDFS, MapReduce, Hive, Kafka, Airflow, Databricks, Flink
  • Languages: SQL, PySpark, Python, Scala, Pig Latin, HiveQL, Shell Scripting
  • Software Methodologies: Agile, SDLC Waterfall
  • Databases: Azure SQL, RDS, MySQL, Oracle, DB2, PostgreSQL, DynamoDB, MS SQL SERVER
  • NoSQL: HBase, MongoDB, Cassandra
  • ETL/BI: Power BI, SAP BO, Tableau
  • Version control: GitHub, Azure Devops, Bitbucket
  • ETL development
  • Data warehousing
  • Data modeling
  • Data pipeline design

Certification

  • MICROSOFT CERTIFIED Fabric Data Engineer Associate
  • INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY Advance Programme in Data Science
  • AWS CERTIFIED Data Engineer - Associate

Timeline

AZURE DATA ENGINEER

ALAMAR BIOSCIENCES, INC
01.2025 - Current

DATA CONSULTANT

SAINT LOUIS UNIVERSITY
09.2023 - 11.2024

AWS DATA ENGINEER

ACCENTURE
08.2022 - 07.2023

JUNIOR DATA ENGINEER

ACCENTURE
01.2020 - 05.2022

Master of Science - Information Systems

SAINT LOUIS UNIVERSITY

Bachelor of Science - Computer Science

SAINT MARY’S COLLEGE
Hareen Edla