Summary

Overview

Work History

Education

Skills

Timeline

Sai Haritha Pillutla

Senior Software Engineer

San Jose

Summary

Senior Software Engineer with 10+ years of experience designing scalable data platforms, data warehouses, and ELT pipelines across AWS, Azure, and GCP for enterprise analytics and regulatory reporting. Strong expertise in Python, SQL, and Spark, building high-performance batch and real-time data pipelines using Snowflake, BigQuery, and Synapse Analytics. Extensive experience in developing and optimizing data warehouses using Snowflake, Redshift, BigQuery, and Azure Synapse, enabling high-volume analytical workloads and reporting. Proficient in designing dimensional data models (star/snowflake schemas) and Data Vault architectures, delivering analytics-ready datasets for business intelligence and compliance use cases. Hands-on experience building scalable ETL/ELT pipelines using AWS Glue, Azure Data Factory, Databricks, and Apache Airflow with strong focus on performance and reliability. Developed real-time streaming pipelines using Kafka, Flink, Pub/Sub, and Kinesis, supporting fraud detection, IoT analytics, and event-driven architectures. Strong programming expertise in Python and PySpark, implementing modular, reusable, and production-grade data engineering solutions with robust error handling and monitoring. Designed and implemented cloud-native data lakes and lakehouse architectures using S3, ADLS Gen2, and GCS, enabling data ingestion, governance, and analytics. Experience with CI/CD and DevOps practices using Jenkins, Azure DevOps, GitHub Actions, Terraform, Docker, and Kubernetes for automated deployments and infrastructure provisioning. Implemented data quality frameworks using Great Expectations and custom validation techniques, ensuring data accuracy, completeness, and consistency across pipelines. Strong experience in metadata management, lineage, and governance using Apache Atlas, Amundsen, AWS Glue Catalog, and cloud-native cataloging solutions. Ensured data security and compliance using IAM, RBAC, encryption, and governance frameworks aligned with GDPR, HIPAA, and financial regulatory standards. Delivered BI and reporting solutions using Power BI, Tableau, and Looker, enabling business users with real-time insights and operational dashboards. Collaborated with business, analytics, and platform teams to translate requirements into scalable data solutions aligned with enterprise architecture standards. Supported machine learning and advanced analytics initiatives by delivering curated, feature-ready datasets and optimized data pipelines. Experienced in building fault-tolerant and highly available data systems with strong monitoring, alerting, and SLA management across distributed environments.

Overview

years of professional experience

Work History

Sr. Software Engineer

Fidelity Investments

10.2023 - Current

Designed enterprise data architecture using Python, AWS S3, and Snowflake, enabling scalable data integration for financial analytics, fraud monitoring, and regulatory reporting. Ensured compliance with banking data governance standards and improved data accessibility across teams.
Developed high-volume ETL pipelines with Python, AWS Glue, and Spark, processing transactional datasets and customer records. Improved pipeline efficiency and reduced processing time through optimized job orchestration and parallel execution.
Built robust data ingestion frameworks using Python, REST APIs, and AWS Lambda, integrating structured and semi-structured data from core banking systems. Ensured high data accuracy and completeness across ingestion layers.
Modelled dimensional data structures in Snowflake using SQL, focusing on fact and dimension tables to support reporting, compliance, and risk analytics. Improved query performance and reporting consistency across business units.
Implemented complex data transformations using Spark, SQL, and AWS EMR, applying partitioning and incremental load strategies. Reduced data latency and improved processing efficiency for large-scale financial datasets.
Architected scalable data lake solutions using AWS S3, Snowflake, and Step Functions, supporting both batch and near real-time processing. Enabled unified access to structured and unstructured financial data.
Developed real-time streaming pipelines using Kafka, Flink, and Python, processing financial transactions for fraud detection. Reduced detection latency and improved alert accuracy for suspicious activities.
Established data quality validation frameworks using Python, Great Expectations, and SQL, ensuring consistency across upstream and downstream streams. Reduced data discrepancies and improved trust in reporting datasets.
Implemented metadata management and lineage tracking using AWS Glue Data Catalog, Amundsen, and CloudWatch, improving data discoverability and governance compliance. Supported audit readiness and traceability.
Automated infrastructure provisioning using Terraform, AWS IAM, and VPC, enabling consistent environment setup and secure deployments. Reduced manual intervention and improved infrastructure reliability.
Managed distributed data processing using Databricks, Spark, and AWS EMR, optimizing cluster configurations for cost efficiency. Improved workload stability under high-volume processing scenarios.
Optimized Snowflake workloads using SQL, clustering, and warehouse tuning strategies, reducing query execution time and compute costs. Improved performance of analytics and reporting queries.
Collaborated with business teams using Jira, Confluence, and SQL to gather requirements and deliver scalable data solutions. Translated regulatory needs into actionable data engineering implementations.
Implemented secure data access controls using AWS KMS, IAM, and encryption strategies, ensuring compliance with SOX and PCI DSS standards. Strengthened data protection across environments.
Delivered curated datasets using Python, Snowflake, and SQL to support machine learning and advanced analytics teams. Enabled faster model development and improved prediction accuracy.
Implemented logging and monitoring frameworks using CloudWatch, Python, and AWS Lambda, enabling proactive issue detection. Reduced incident resolution time and improved system observability.
Supported compliance reporting using SQL, Snowflake, and AWS S3, ensuring accurate financial reporting and audit readiness. Improved data traceability for regulatory submissions
Environment: Python, AWS S3, AWS Glue, AWS Lambda, Step Functions, EMR, Snowflake, Apache Spark, Apache Kafka, Apache Flink, Terraform, Great Expectations, CloudWatch, AWS Glue Data Catalog, Amundsen, Databricks, Jira, Confluence

Senior Software Engineer

Accenture

03.2021 - 09.2023

Designed enterprise data pipelines using Python, Azure Data Factory, and Azure Data Lake Gen2, integrating data from SAP and Oracle systems. Ensured scalable ingestion architecture aligned with public-sector reporting and compliance requirements.
Developed distributed transformation frameworks using Spark, Databricks, and Python, standardizing and cleansing large datasets. Improved data quality and enabled consistent analytics for regulatory reporting.
Built analytical data models using Azure Synapse Analytics, T-SQL, and SQL, creating curated schemas for reporting. Enhanced query performance and supported downstream business intelligence use cases.
Implemented data profiling and validation processes using Python, Pandas, and SQL, identifying inconsistencies across datasets. Improved data accuracy and strengthened trust in enterprise reporting outputs.
Implemented CI/CD pipelines using Azure DevOps, Git, and YAML, automating deployment of data workflows and infrastructure. Reduced release cycle time and improved deployment consistency.
Developed real-time streaming pipelines using Kafka, Azure Stream Analytics, and Python, processing IoT data for traffic and water management. Enabled near real-time insights for public services.
Integrated dashboards using Power BI, Azure Synapse Analytics, and SQL, providing operational visibility into key performance metrics. Supported data-driven decisions across state departments.
Implemented secure data access using Azure Key Vault, RBAC, and encryption strategies, ensuring protection of sensitive datasets. Maintained compliance with state security policies.
Documented metadata and lineage using Apache Atlas, Azure Data Factory, and SQL, improving traceability of data transformations. Supported audit and governance requirements.
Ensured compliance with HIPAA and PII standards using Azure security controls, SQL, and data masking techniques. Strengthened data privacy across analytical datasets.
Mentored junior engineers using Python, Spark, and Databricks, guiding development of scalable data solutions. Improved team productivity and adherence to engineering standards.
Monitored pipelines using Azure Monitor, Log Analytics, and Python, identifying failures and performance bottlenecks. Improved system uptime and operational stability.
Optimized data storage using Parquet, Delta Lake, and Azure Data Lake Gen2, improving read/write performance. Reduced storage costs and improved processing efficiency.
Built reusable data components using Python, Spark, and Azure Data Factory, standardizing pipeline development practices. Accelerated delivery of new data integration workflows.
Environment: Python, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Gen2, Databricks, Apache Spark, Apache Airflow, Azure Kubernetes Service, Kafka, Azure Stream Analytics, Power BI, Azure DevOps, Azure Key Vault, RBAC, Apache Atlas, Azure SQL DB, Parquet, Delta Lake, Azure Monitor, Log Analytics

Software Engineer

Home Depot

10.2019 - 02.2021

Designed end-to-end data pipelines using Python, GCP BigQuery, and Cloud Storage, supporting payment analytics and regulatory reporting. Ensured scalable processing of high-volume financial transactions with strong data governance controls.
Developed batch processing workflows using Python, Dataflow, and BigQuery, transforming structured datasets into analytical models. Improved processing efficiency and enabled faster reporting for payment operations.
Built streaming data pipelines using Pub/Sub, Dataflow, and Apache Beam, processing real-time payment messages. Reduced latency in transaction processing and supported near real-time monitoring.
Implemented ETL frameworks using Python, Apache Beam, and JSON processing, handling semi-structured and structured data. Standardized ingestion patterns across multiple financial data formats.
Normalized financial message data using Python, BigQuery, and XML integration, transforming ISO 20022 and SWIFT formats into unified schemas. Improved consistency across payment processing datasets.
Designed analytical data models using BigQuery, SQL, and partitioning strategies, supporting large-scale reporting. Enhanced query performance and reduced data scan costs.
Orchestrated workflows using Cloud Composer, Apache Airflow, and Python, managing dependencies across batch and streaming jobs. Improved pipeline scheduling and operational reliability.
Implemented secure data storage using Cloud Storage, encryption, and IAM policies, ensuring compliance with financial data standards. Protected sensitive payment information across environments.
Built data observability and lineage tracking using Data Catalog, Open Lineage, and Python, improving data discoverability. Supported audit readiness and governance compliance.
Developed internal reusable components using Python, Apache Beam, and GitHub, standardizing ETL development. Accelerated pipeline delivery and reduced duplication across teams.
Created dashboards using Looker, BigQuery, and SQL, providing insights into payment processing metrics. Enabled business teams to monitor operational health and compliance indicators.
Implemented CI/CD pipelines using Cloud Build, GitHub Actions, and Terraform, automating deployment of data workflows. Improved release reliability and reduced manual errors.
Optimized query performance using BigQuery, clustering, and materialized views, reducing execution time and compute costs. Improved performance for large-scale analytical queries.
Supported production operations using Cloud Monitoring, logging, and Python, resolving pipeline issues proactively. Reduced downtime and improved system stability.
Migrated legacy ETL workloads using Python, Dataflow, and GCP services, improving scalability and maintainability. Enabled transition to modern cloud-native architecture.
Collaborated with cross-functional teams using SQL, Looker, and GCP services to deliver data solutions. Translated business and compliance requirements into scalable implementations.
Applied Agile practices using GitHub, sprint planning, and CI/CD workflows, ensuring continuous delivery of data pipelines. Improved collaboration and delivery timelines.
Environment: Python, GCP BigQuery, Dataflow, Cloud Composer, Pub/Sub, Cloud Storage, Data Catalog, Cloud Build, Cloud Functions, Cloud Run, Apache Beam, Looker, Terraform, GitHub Actions, Open Lineage, JSON, XML, Avro

Software Engineer

Microsoft

07.2017 - 07.2019

Designed, developed, and optimized enterprise ETL pipelines using PySpark, Python, and SQL to support large-scale banking data processing.
Engineered high-volume data ingestion workflows to migrate data from on-premises systems into AWS S3–based data lake architectures.
Designed and implemented AWS-based data lake and warehouse solutions using AWS Glue, Athena, and Redshift to enable analytics and reporting.
Built reusable data marts and curated datasets in collaboration with analysts and data science teams.
Migrated legacy ETL processes from Informatica to modern Python-based data pipelines, improving maintainability and scalability.
Implemented Delta Lake on Databricks to support ACID transactions, schema evolution, and dataset versioning.
Deployed and managed Apache Airflow for workflow orchestration, scheduling, and dependency management.
Implemented CI/CD pipelines for data workflows using Git, Jenkins, and Terraform to support automated deployments.
Developed data quality validation, reconciliation, and anomaly detection frameworks using Great Expectations and custom Python utilities.
Built and maintained real-time streaming pipelines using Kafka and AWS Kinesis to support fraud detection and near real-time analytics.
Enabled business intelligence and reporting by delivering analytics-ready datasets for Tableau and Power BI dashboards.
Implemented data governance and access control frameworks using Apache Atlas and AWS Lake Formation, supporting lineage and classification.
Established monitoring and alerting frameworks using CloudWatch and PagerDuty to ensure pipeline reliability and operational stability.
Collaborated closely with product, analytics, and platform teams to deliver scalable and secure data solutions.
Mentored junior data engineers and supported knowledge-sharing initiatives across the team.
Environment: Python, PySpark, SQL, AWS (S3, Glue, Athena, Redshift, Lake Formation, CloudWatch, Kinesis), Databricks, Delta Lake, Apache Airflow, Apache Atlas, Kafka, Jenkins, Terraform, Git, Great Expectations, Tableau, Power BI, Informatica

Software Engineer

Novartis

05.2015 - 06.2017

Designed and developed robust ETL pipelines using SSIS, Python, and SQL Server, processing 100M+ records daily.
Migrated multiple client workloads from on-prem SQL Server to Azure SQL Database, improving performance and scalability.
Automated reporting pipelines with Power BI Embedded and Excel VBA, reducing manual efforts by 50%.
Built data ingestion tools for REST APIs and FTP sources using Python and Pandas.
Orchestrated ETL workflows with Azure Data Factory v1, enhancing pipeline visibility and control.
Managed raw and curated data storage using Azure Blob Storage and Data Lake Store for effective data lifecycle management.
Developed reusable Python libraries and implemented incremental loading via Change Data Capture (CDC) for optimized processing.
Built reconciliation tools and applied T-SQL for complex stored procedures, views, and indexing to ensure data integrity.
Conducted unit and integration testing and collaborated with QA for automated test data generation.
They translated them into technical specifications and followed the Agile methodology with JIRA for sprint delivery.
Documented ETL workflows, maintained job schedules, and conducted knowledge transfer sessions on cloud best practices.
Environment: SSIS, Python, SQL Server, Azure SQL Database, Azure Data Factory v1, Azure Blob Storage, Azure Data Lake Store, Power BI Embedded, Excel VBA, REST APIs, FTP, T-SQL, CDC, JIRA.

Education

Bachelor of Science - Electrical, Electronics And Communications Engineering

Malla Reddy College of Engineering

TELANGANA , INDIA

05.2001 -

Skills

Programming & Data Processing: Python, PySpark, SQL, T-SQL, Scala, Java, Pandas, NumPy
Big Data & Streaming: Apache Spark, Databricks, Apache Flink, Apache Beam, Kafka, Kinesis, Pub/Sub, Hadoop
Data Warehousing & Databases: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics, SQL Server, Oracle, PostgreSQL
ETL/ELT & Orchestration: AWS Glue, Azure Data Factory, Apache Airflow, Cloud Composer, dbt, Matillion, Informatica IICS
Cloud Platforms: AWS (S3, Glue, Lambda, EMR, Redshift, Kinesis, Step Functions, IAM, KMS, CloudWatch); Azure (Data Factory, ADLS Gen2, Synapse, Databricks, AKS, Azure DevOps, Key Vault); GCP (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Composer, Cloud Build)

DevOps & CI/CD: Jenkins, Azure DevOps, GitHub Actions, Terraform, Docker, Kubernetes
Data Modeling & Governance: Dimensional Modeling, Data Vault 20, Star/Snowflake Schema, Apache Atlas, Amundsen, Glue Catalog
Data Quality & Security: Great Expectations, Data Validation, IAM, RBAC, Encryption, GDPR, HIPAA
Visualization & Reporting: Power BI, Tableau, Looker

Timeline

Sr. Software Engineer

Fidelity Investments

10.2023 - Current

Senior Software Engineer

Accenture

03.2021 - 09.2023

Software Engineer

Home Depot

10.2019 - 02.2021

Software Engineer

Microsoft

07.2017 - 07.2019

Software Engineer

Novartis

05.2015 - 06.2017

Bachelor of Science - Electrical, Electronics And Communications Engineering

Malla Reddy College of Engineering

05.2001 -