Summary

Overview

Work History

Education

Skills

Certification

Timeline

SRUSHTI RAM

Dallas

Summary

Python-focused Data Engineer with experience building scalable backend services, secure cloud-native data platforms, and foundational data solutions across enterprise financial environments. Strong expertise in Python, AWS, SQL, API integrations, reusable service development, and interactive reporting, with experience supporting authentication and authorization workflows, secure access controls, and shared data assets across cross-functional agile teams. Proven ability to translate business requirements into scalable platform features, support customer-facing and internal applications, and build reliable data foundations for analytics, predictive modeling, and GenAI/LLM-driven solutions.

Overview

years of professional experience

Certification

Work History

Data Engineer

Principal Financial Group (PFG)

Des Moines, IA

04.2025 - 04.2026

Architected an end-to-end S3-to-Snowflake ingestion platform (EFG File Reconciliation Pipeline) from scratch utilizing AWS CDK (CloudFormation) infrastructure, CI/CD pipelines, SQLFluff quality gates, and multi-environment deployments, managing container images via Amazon ECR and deploying serverless compute with AWS Fargate to reduce time-to-production for new data feeds from weeks to days.
Engineered scalable, event-driven Snowpipe pipelines to automate the ingestion of daily MFT transactional data across 50+ core business tables in Snowflake, interfacing with multiple data products to replace manual file reconciliation and save 15+ hours of analyst effort per week.
Built a scalable data lake on AWS S3 with date-partitioned Parquet storage, Glue orchestration, and SNS error alerting, reliably processing 100K+ MFT records daily.
Designed a partition-aware, cross-account backfill framework using Python (boto3) to recover days of production data gaps with idempotent load-status tracking.
Developed and automated interactive executive dashboards exclusively in AWS QuickSight and Power BI.
Authored robust Python scripts to automate ad-hoc manual tasks, developing features aligned with specific user-defined requirements and cross-system integrations, including building backend data APIs to support internal monitoring tools built with React, ES6, and Webpack.
Developed Python-based shared backend services and secure data access workflows for enterprise data products, supporting foundational platform capabilities and cross-team feature delivery in an agile environment.
Defined and implemented reusable Python-based backend components and foundational data workflows that improved secure access, consistency, and scalability across shared enterprise data products.
Engineered scalable data foundations for predictive modeling by integrating GenAI/LLM APIs via Python directly into the ETL pipeline.
Established IaC standards with AWS CDK, automated linting, and snapshot testing, ensuring unit and integration tests are complete to reliably deploy and support applications across multiple platforms with significantly fewer release defects.

Software Engineer - Data

Walmart (Sam's Club)

Bentonville, AR

01.2025 - 04.2025

Built a custom data quality framework using Python (PySpark) and Azure functions, ensuring data integrity and reliability.
Collaborated with DevOps teams to streamline CI/CD pipelines for deploying and managing data engineering applications in containerized environments using Docker.
Designed and developed reusable PySpark libraries and utilities, leveraging ORM concepts to improve code maintainability for common data processing tasks.
Completed data extraction, aggregation, and analysis in HDFS by using PySpark, and stored the data needed to Hive and Blob Storage.
Migrated on-premises data warehouse processes to Databricks.
Developed tools to monitor and report on data quality, and successfully deployed and supported cross-platform applications for data profiling and validation.
Analyzed data to determine outliers and data accuracy issues, using Python libraries to make corrections.
Developed efficient PySpark scripts for data processing and transformation on Databricks.
Developed Python-based shared utilities and application support components for internal data workflows, improving maintainability and secure access across enterprise platforms.
Collaborated on identity-aware application patterns in Azure-based environments, with exposure to Microsoft Entra ID concepts supporting secure authentication, authorization, and controlled access to internal tools used by large user populations.

Data Engineer

Capital One

Plano, TX

12.2021 - 12.2024

Implemented a real-time data quality monitoring system using custom Python scripts and SQL stored procedures.
Developed a scalable data lake solution on AWS cloud computing technologies, leveraging S3 for storage, Athena for querying, and data cataloging.
Migrated relational databases from RDS to DynamoDB, redesigning data access patterns.
Created custom monitoring dashboards with CloudWatch and CloudTrail to track access patterns, throttling events, and secure authentication and authorization workflows.
Developed an automated financial data validation framework with complex business logic, reducing manual verification time by 60%.
Performed cohort analysis and customer segmentation using Python and SQL.
Integrated Google Analytics 4 (GA4) with digital platforms, collaborating closely with UI teams to ensure accurate event tracking and data capture across applications built with Modern JavaScript.
Designed and implemented scalable data pipelines on AWS cloud infrastructure.
Supported backend data integrations for customer-facing auto financing platforms, helping establish reliable data and analytics foundations for inventory, application, and user journey insights.
Collaborated with product and UI teams on digital platform enhancements, including event tracking, customer behavior analysis, and support for location-aware feature requirements across web and mobile experiences.
Exposure to location-aware platform use cases and geospatial visualization tools such as Mapbox.

Data Engineer/ ETL Consultant

Hanger Inc.

Dallas, TX

02.2021 - 12.2021

Developed serverless Python ETL pipelines on AWS Lambda ingesting HL7 messages and converting them into FHIR resources, powering prosthetics outcome dashboards with near-real-time updates for clinicians.
Implemented asynchronous extract loaders using aiohttp and asyncio, achieving sustained 50 ms median latency when harvesting clinic EMR data across 35 facilities during nightly windows.
Refactored procedural SQL scripts into modular SQLAlchemy repositories with dependency injection, increasing unit-test coverage from 18 % to 70 % and simplifying privacy impact assessments.
Exposed patient mobility aggregates through FastAPI endpoints secured behind OAuth2, enabling regional managers to compare clinic performance via real-time dashboards.
Modeled star schema in Redshift encompassing appointment, device, and outcome facts plus slow-changing patient dimensions, supporting Tableau visualizations for evidence-based treatment planning.
Orchestrated incremental Airflow DAGs copying Parquet files from S3 into Redshift Spectrum, shaving nightly load duration from six hours to 90 minutes.
Authored dbt models computing 90-day post-fit functional improvement deltas, automating regulatory submission datasets and eliminating manual Excel pivot tables.
Applied Great Expectations validations on EMR extracts, quarantining records missing laterality codes and preserving metric accuracy during quarter-end reporting cycles.
Provisioned infrastructure via Terraform executed in AWS CodeBuild pipelines, delivering immutable builds and consistent tagging across dev, test, and production accounts.
Configured CloudWatch composite alarms with PagerDuty escalation policies, ensuring on-call clinicians received immediate notification of pipeline failures outside business hours.
Integrated AWS Secrets Manager rotation, eliminating hard-coded credentials in Lambda layers and passing yearly HIPAA penetration testing without remediation items.
Enforced network isolation with VPC endpoints and granular security groups between Lambda and Redshift, satisfying strict least-privilege audit requirements.
Visualized outcome clusters via OpenLayers heat maps layered over census data, guiding executive decisions on opening two new rehabilitation clinics.
Facilitated backlog refinement workshops with clinicians and engineers, removing ambiguity from HL7 field mappings and decreasing rework tickets across subsequent sprints.
Authored comprehensive data dictionaries and lineage diagrams in Confluence, reducing onboarding time for new analysts by one week.
ENVIRONMENT Python, AWS Lambda, aiohttp, asyncio, SQLAlchemy, FastAPI, Redshift, Airflow, S3, Parquet, dbt, Great Expectations, Terraform, AWS CodeBuild, CloudWatch, PagerDuty, AWS Secrets Manager, OpenLayers, VPC, Tableau, HL7, FHIR

Data Engineer

Bank of America

Dallas, TX

07.2018 - 02.2021

Migrated core AML enrichment routines from PL/SQL to vectorized Python pandas modules, eliminating duplicate logic across five stored procedures and simplifying maintenance activities.
Deployed batch scoring engine on AWS EMR with Spark broadcast joins, processing daily sanctions list comparisons against 300 million transactions in under two hours.
Published reusable FastAPI microservice delivering suspicious activity findings to compliance dashboards, replacing ad-hoc CSV exchanges and providing real-time case triage.
Created internal PyPI wheel packages for common regex parsers, guaranteeing consistent narrative field extraction across all ETL pipelines and reducing code duplication.
Offloaded 24 TB Oracle tables to S3 Parquet partitions using AWS DMS, boosting investigative Athena query performance by 70 %.
Converted 200+ cron ETLs into dependency-aware Airflow DAGs, adding retry logic that rescued fragile exchange rate feeds without operator intervention.
Built dbt models assigning composite risk scores, embedding column-level documentation and automating catalog updates consumed by compliance analysts.
Implemented Great Expectations threshold tests on risk factors, blocking production promotion when null rates exceeded 0.5 %, thereby preserving data integrity.
Automated EMR cluster creation via CloudFormation templates and bootstrap scripts, enabling in-place blue-green upgrades with single toggle cutovers.
Enforced CI for Jupyter notebooks using Jenkins jobs running pytest and nbconvert, catching runtime errors before analysts merged changes.
Integrated AWS Secrets Manager with Airflow, rotating credentials automatically and achieving SOX compliance without manual ticket workflows.
Tightened IAM roles by replacing static keys with STS tokens and mandatory MFA, clearing auditor action items.
Chaired weekly code reviews enforcing PEP8 conventions and encouraging idiomatic usage, fostering shared ownership and improving readability across fifteen engineers.
Authored onboarding runbooks describing AWS connectivity, local dev containers, and data governance patterns, reducing new hire setup time from two days to four hours.
Coordinated cutover weekend with fraud analytics, DBA, and network teams, executing migration steps smoothly and achieving zero downtime for payment processing.
ENVIRONMENT Python, pandas, AWS EMR, Spark, Py4J, FastAPI, Oracle, S3, Parquet, Athena, Airflow, dbt, Great Expectations, CloudFormation, Jenkins, pytest, nbconvert, AWS Secrets Manager, IAM, STS

ETL/SQL Developer

HSBC GLT

Hyderabad, IN

05.2016 - 11.2016

Authored Python audit scripts comparing record counts and CRC32 checksums between staging and production tables, eliminating manual reconciliation steps previously consuming two analyst hours daily.
Re-engineered SSIS packages with range-based partitioning and parallel dataflows, reducing average execution duration for 20 million-row loads from 75 minutes to 45 minutes.
Consolidated currency conversion logic into parameterized T-SQL stored procedures, ensuring consistent exchange rates across six downstream reporting cubes.
Created Python CLI utility validating inbound fixed-width files, generating standardized error reports and returning non-zero exit codes to halt flawed loads.
Converted nightly 50 GB flat-file loads to SQL Server bulk inserts with table partitioning, trimming ingestion window and freeing maintenance buffers.
Introduced incremental CDC pipeline using datetime watermark columns and MERGE statements, avoiding full reloads for high-volume transaction facts.
Built interactive Power BI dashboard presenting row counts, null ratios, and load durations, empowering stakeholders to monitor data freshness autonomously.
Automated statistics maintenance and index rebuilds via Ola Hallengren scripts scheduled through SQL Agent, reducing query timeouts reported by analysts.
Established CI/CD with Azure DevOps pipelines executing dtexec, SQLCMD, and unit tests, providing consistent promotion paths across environments.
Adopted Git branching workflow for SSIS projects, introducing pull requests and automated code reviews that improved code quality.
Implemented row-level security via inline table-valued functions, ensuring regional analysts accessed only authorized customer data sets.
Secured connection strings with DPAPI-encrypted configuration files, removing plaintext credentials from shared network locations and satisfying internal audit recommendations.
Authored detailed runbooks describing ETL schedules, alert channels, and rollback steps, accelerating incident resolution during overnight failures.
Facilitated weekly knowledge-share sessions covering set-based query tuning strategies, elevating junior developers’ proficiency and reducing cursor usage.
Collaborated with BI team validating semantic layer joins and responding to ad-hoc data questions, cutting backlog of unresolved tickets by half.
ENVIRONMENT Python, SQL Server, SSIS, Git, Azure DevOps, Power BI, Ola Hallengren, DPAPI, SQL Agent, SSMS, T-SQL, CDC, Bulk Insert, Partitioning, CRC32

Education

Master of Science - Management and Information Systems

University of Illinois Springfield

Springfield, IL

Skills

Programming Languages: Python (Pandas, NumPy, Scikit-Learn, API integrations, GenAI/LLM integrations), Scala, SQL (relational databases), PL/SQL, HIVEQL
Cloud & Data Engineering: AWS cloud technologies (S3, Lambda, CloudFormation, Batch, Fargate, ECR, IAM, EMR, Athena, Glue, CDK, Snowpipe, Step Functions), Azure, Databricks, Snowflake, ETL/ELT, Data Modeling
Security & Platform Concepts: AWS IAM, authentication and authorization workflows, secure access controls, backend APIs, reusable Python libraries
Reporting & BI Tools: AWS QuickSight, Power BI, Tableau, Grafana, SSRS, DBT

Big Data & NoSQL: Apache Spark, Hadoop, DynamoDB, MongoDB, Cassandra, HBase, HDFS
Visualization & Platform Exposure: Interactive reporting, location-aware data visualization, and familiarity with geospatial visualization tools such as Mapbox
CI/CD & Tools: Git/GitHub, SQLFluff, PyCharm, VS Code, Jupyter, CI/CD pipelines, containerized environments, Docker
AI / Analytics Exposure: Data foundations for predictive modeling, GenAI/LLM API integrations, and analytics-oriented automation

Certification

Azure Data Engineer - Associate
AWS Developer - Associate

Timeline

Data Engineer

Principal Financial Group (PFG)

04.2025 - 04.2026

Software Engineer - Data

Walmart (Sam's Club)

01.2025 - 04.2025

Data Engineer

Capital One

12.2021 - 12.2024

Data Engineer/ ETL Consultant

Hanger Inc.

02.2021 - 12.2021

Data Engineer

Bank of America

07.2018 - 02.2021

ETL/SQL Developer

HSBC GLT

05.2016 - 11.2016

Master of Science - Management and Information Systems

University of Illinois Springfield