Summary
Overview
Work History
Education
Skills
Certification
Timeline
BusinessDevelopmentManager

Swathi V

Data Engineer
Dallas,TX

Summary

Data Engineer with 5 years of experience specializing in building scalable ETL pipelines, data processing, and cloud data solutions using AWS, Azure, Python, PySpark, SQL, and Databricks. I am proficient in containerization, CI/CD, business intelligence, and data warehousing with tools like Tableau, Power BI, and Kubernetes.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

Comerica Bank
Frisco, TX
10.2023 - Current
  • Automated file processing and ingestion using Informatica IICS, orchestrating data movement from AWS S3 into Snowflake with dynamic parameter handling and error control.
  • Developed Python scripts for pre-ingestion file validation (e.g., row count, schema consistency, delimiter checks), integrated with IICS workflows to ensure data readiness.
  • Used Python for metadata extraction, logging, and automated email notifications for missing or malformed files before ETL jobs were triggered.
  • Leveraged Python in Snowflake to perform scalable data transformations, including filtering, joins, aggregations, and enrichment directly within Snowflake.
  • Built Python-based User Defined Functions (UDFs) in Snowflake to implement complex business logic not supported by standard SQL, enhancing transformation flexibility.
  • Designed reusable Python modules for standardized validation checks (e.g., null handling, pattern validation, field-level rules) applied across multiple datasets.
  • Wrote and scheduled Python jobs and IICS Command Tasks to trigger downstream workflows based on S3 file arrival events.
  • Implemented logging and exception-handling mechanisms in Python to track ETL success/failure, ensuring observability and reducing troubleshooting time.
  • Utilized Python scripts to generate control reports (file stats, load summaries, DQ violations) and upload them back to S3 or distribute to business users.
  • Developed parameterized and modular IICS mappings and taskflows for scalable data ingestion into Snowflake's stage, curated, and integration layers.
  • Applied complex SQL and Python logic in Snowflake for data cleansing, standardization, and formatting to align with downstream reporting and compliance needs.
  • Automated data quality checks using Python and Snowflake SQL for primary key validation, duplicate detection, range checks, and referential integrity.
  • Used Bitbucket to version control Python scripts, SQL code, and IICS artifacts; collaborated on code reviews, release management, and multi-environment deployments.
  • Integrated Python test scripts into the QA process for pipeline verification, schema drift detection, and regression testing of critical financial datasets.
  • Environment: PySpark, Informatica IICS, AWS S3, Snowflake, MySQL, AWS CloudWatch, Shell Scripting, Python, SQL, Bitbucket, CI/CD Pipelines

Data Engineer

Albertsons
Plano, TX
02.2023 - 09.2023
  • Designed and developed scalable database systems based on e-commerce business requirements analysis, implementing Snowflake and PostgreSQL architectures with dimensional modeling for customer behavior analytics, inventory management, and personalization engines.
  • Developed comprehensive Python tools for automated data processing from AWS S3 buckets, Kafka streams, and various file formats (CSV, JSON, Parquet), enabling real-time ingestion and batch processing of customer interaction data for behavioral analytics.
  • Built advanced PySpark applications on Databricks and EMR for large-scale data cleaning and transformation, implementing complex business logic for customer segmentation, session analysis, and conversion funnel optimization using Spark DataFrames and custom UDFs.
  • Performed extensive ETL validation using Python and SQL frameworks to conduct quantitative analysis of clickstream data, cart events, and product interactions, ensuring data accuracy between source systems and target analytics platforms.
  • Created interactive analytical presentations using Python-based reporting tools and SQL dashboards in Snowflake and PostgreSQL, delivering behavioral KPIs, conversion metrics, and A/B testing results to marketing and customer experience teams.
  • Automated end-to-end data workflows using Apache Airflow DAGs with Python orchestration, implementing SLA monitoring, retry mechanisms, and automated alerting for production reliability in high-throughput e-commerce environments.
  • Developed real-time monitoring solutions using Prometheus and Grafana with Python metrics collection, presenting data lag analysis, job performance statistics, and cluster utilization reports for proactive incident detection and system optimization.
  • Implemented comprehensive data validation frameworks using Python and SQL testing suites for schema validation, data lineage tracking, and quality assurance of machine learning feature stores and personalization algorithms.
  • Environment: Python, SQL, PySpark, Apache Kafka, Databricks, AWS EMR, AWS S3, Apache Airflow, Snowflake, PostgreSQL, Docker, Kubernetes (EKS), Helm, GitHub Actions, Prometheus, Grafana

Jr Data Engineer

Indian Servers
Hyderabad
04.2019 - 04.2021
  • Designed and developed basic database systems based on retail business requirements analysis, creating PostgreSQL schemas with staging, fact, and dimension tables optimized for daily sales reporting and multi-location consolidation.
  • Developed Python automation tools for reading and cleaning daily sales data from local file systems and AWS S3 buckets, handling dynamic file naming patterns and various CSV formats from 100+ retail stores across India.
  • Built foundational PySpark pipelines for data transformation and aggregation, implementing basic cleaning logic, type casting, and data standardization to ensure consistent reporting across multiple retail locations.
  • Performed ETL job validation using SQL queries and Python scripts to conduct quantitative analysis of sales data accuracy, comparing source file counts with target database records and identifying data discrepancies for business reporting.
  • Created analytical reports and visualizations using SQL summary queries and Python reporting scripts, supporting BI teams in preparing daily sales dashboards and performance metrics using Tableau and Power BI.
  • Automated data pipeline processes using Apache Airflow with Python task orchestration, implementing email notifications and automated alerts for early identification of pipeline failures and data delays.
  • Developed data monitoring solutions using Python logging frameworks and SQL validation queries, creating detailed audit trails and data lineage documentation for troubleshooting and quality assurance.
  • Implemented basic data validation procedures using Python and SQL testing scripts for record count verification, data type validation, and business rule compliance to ensure accurate retail analytics and reporting consistency.
  • Environment: Python, SQL, PySpark, PostgreSQL, AWS S3, Apache Airflow, Linux, Shell Scripting, Tableau, Power BI

Education

Master of Science - Computer Science

University of North Texas
12.2022

Bachelor of Technology - Computer Science

Stanley College of Engineering and Technology for women
09.2020

Skills

  • Python
  • SQL
  • Shell scripting
  • JavaScript
  • PySpark
  • Pandas
  • Apache Kafka
  • Apache Airflow
  • Data Transformation
  • ETL processes
  • AWS
  • Azure
  • Snowflake
  • PostgreSQL
  • MySQL
  • Oracle DB
  • MongoDB
  • Cosmos DB
  • AWS EMR
  • Data Lake
  • Data Warehouse
  • Azure SQL Database
  • Docker
  • Kubernetes
  • Helm
  • Terraform
  • Git
  • GitHub
  • Jenkins
  • GitHub Actions
  • CI/CD Pipelines
  • Tableau
  • Power BI
  • Prometheus
  • Grafana

Certification

  • Database Administration, Microsoft Technology Associate
  • Python Level Programming, Microsoft Technology Associate
  • Salesforce Trailhead, Association for Computing Machinery

Timeline

Data Engineer

Comerica Bank
10.2023 - Current

Data Engineer

Albertsons
02.2023 - 09.2023

Jr Data Engineer

Indian Servers
04.2019 - 04.2021

Master of Science - Computer Science

University of North Texas

Bachelor of Technology - Computer Science

Stanley College of Engineering and Technology for women
Swathi VData Engineer