Data Engineer with 5 years of experience specializing in building scalable ETL pipelines, data processing, and cloud data solutions using AWS, Azure, Python, PySpark, SQL, and Databricks. I am proficient in containerization, CI/CD, business intelligence, and data warehousing with tools like Tableau, Power BI, and Kubernetes.
Overview
6
6
years of professional experience
1
1
Certification
Work History
Data Engineer
Comerica Bank
Frisco, TX
10.2023 - Current
Automated file processing and ingestion using Informatica IICS, orchestrating data movement from AWS S3 into Snowflake with dynamic parameter handling and error control.
Developed Python scripts for pre-ingestion file validation (e.g., row count, schema consistency, delimiter checks), integrated with IICS workflows to ensure data readiness.
Used Python for metadata extraction, logging, and automated email notifications for missing or malformed files before ETL jobs were triggered.
Leveraged Python in Snowflake to perform scalable data transformations, including filtering, joins, aggregations, and enrichment directly within Snowflake.
Built Python-based User Defined Functions (UDFs) in Snowflake to implement complex business logic not supported by standard SQL, enhancing transformation flexibility.
Designed reusable Python modules for standardized validation checks (e.g., null handling, pattern validation, field-level rules) applied across multiple datasets.
Wrote and scheduled Python jobs and IICS Command Tasks to trigger downstream workflows based on S3 file arrival events.
Implemented logging and exception-handling mechanisms in Python to track ETL success/failure, ensuring observability and reducing troubleshooting time.
Utilized Python scripts to generate control reports (file stats, load summaries, DQ violations) and upload them back to S3 or distribute to business users.
Developed parameterized and modular IICS mappings and taskflows for scalable data ingestion into Snowflake's stage, curated, and integration layers.
Applied complex SQL and Python logic in Snowflake for data cleansing, standardization, and formatting to align with downstream reporting and compliance needs.
Automated data quality checks using Python and Snowflake SQL for primary key validation, duplicate detection, range checks, and referential integrity.
Used Bitbucket to version control Python scripts, SQL code, and IICS artifacts; collaborated on code reviews, release management, and multi-environment deployments.
Integrated Python test scripts into the QA process for pipeline verification, schema drift detection, and regression testing of critical financial datasets.
Designed and developed scalable database systems based on e-commerce business requirements analysis, implementing Snowflake and PostgreSQL architectures with dimensional modeling for customer behavior analytics, inventory management, and personalization engines.
Developed comprehensive Python tools for automated data processing from AWS S3 buckets, Kafka streams, and various file formats (CSV, JSON, Parquet), enabling real-time ingestion and batch processing of customer interaction data for behavioral analytics.
Built advanced PySpark applications on Databricks and EMR for large-scale data cleaning and transformation, implementing complex business logic for customer segmentation, session analysis, and conversion funnel optimization using Spark DataFrames and custom UDFs.
Performed extensive ETL validation using Python and SQL frameworks to conduct quantitative analysis of clickstream data, cart events, and product interactions, ensuring data accuracy between source systems and target analytics platforms.
Created interactive analytical presentations using Python-based reporting tools and SQL dashboards in Snowflake and PostgreSQL, delivering behavioral KPIs, conversion metrics, and A/B testing results to marketing and customer experience teams.
Automated end-to-end data workflows using Apache Airflow DAGs with Python orchestration, implementing SLA monitoring, retry mechanisms, and automated alerting for production reliability in high-throughput e-commerce environments.
Developed real-time monitoring solutions using Prometheus and Grafana with Python metrics collection, presenting data lag analysis, job performance statistics, and cluster utilization reports for proactive incident detection and system optimization.
Implemented comprehensive data validation frameworks using Python and SQL testing suites for schema validation, data lineage tracking, and quality assurance of machine learning feature stores and personalization algorithms.
Designed and developed basic database systems based on retail business requirements analysis, creating PostgreSQL schemas with staging, fact, and dimension tables optimized for daily sales reporting and multi-location consolidation.
Developed Python automation tools for reading and cleaning daily sales data from local file systems and AWS S3 buckets, handling dynamic file naming patterns and various CSV formats from 100+ retail stores across India.
Built foundational PySpark pipelines for data transformation and aggregation, implementing basic cleaning logic, type casting, and data standardization to ensure consistent reporting across multiple retail locations.
Performed ETL job validation using SQL queries and Python scripts to conduct quantitative analysis of sales data accuracy, comparing source file counts with target database records and identifying data discrepancies for business reporting.
Created analytical reports and visualizations using SQL summary queries and Python reporting scripts, supporting BI teams in preparing daily sales dashboards and performance metrics using Tableau and Power BI.
Automated data pipeline processes using Apache Airflow with Python task orchestration, implementing email notifications and automated alerts for early identification of pipeline failures and data delays.
Developed data monitoring solutions using Python logging frameworks and SQL validation queries, creating detailed audit trails and data lineage documentation for troubleshooting and quality assurance.
Implemented basic data validation procedures using Python and SQL testing scripts for record count verification, data type validation, and business rule compliance to ensure accurate retail analytics and reporting consistency.
Environment: Python, SQL, PySpark, PostgreSQL, AWS S3, Apache Airflow, Linux, Shell Scripting, Tableau, Power BI
Education
Master of Science - Computer Science
University of North Texas
12.2022
Bachelor of Technology - Computer Science
Stanley College of Engineering and Technology for women
09.2020
Skills
Python
SQL
Shell scripting
JavaScript
PySpark
Pandas
Apache Kafka
Apache Airflow
Data Transformation
ETL processes
AWS
Azure
Snowflake
PostgreSQL
MySQL
Oracle DB
MongoDB
Cosmos DB
AWS EMR
Data Lake
Data Warehouse
Azure SQL Database
Docker
Kubernetes
Helm
Terraform
Git
GitHub
Jenkins
GitHub Actions
CI/CD Pipelines
Tableau
Power BI
Prometheus
Grafana
Certification
Database Administration, Microsoft Technology Associate
Python Level Programming, Microsoft Technology Associate
Salesforce Trailhead, Association for Computing Machinery
Timeline
Data Engineer
Comerica Bank
10.2023 - Current
Data Engineer
Albertsons
02.2023 - 09.2023
Jr Data Engineer
Indian Servers
04.2019 - 04.2021
Master of Science - Computer Science
University of North Texas
Bachelor of Technology - Computer Science
Stanley College of Engineering and Technology for women