Summary
Overview
Work History
Education
Skills
Timeline
Generic

John Peterson

Santa Clara,CA

Summary

Accomplished Senior Data Engineer with over 10 years of expertise in designing and optimizing cloud data platforms in healthcare, FinTech, and e-commerce. Specializes in Snowflake data warehousing, ELT architecture, and performance tuning for analytical workloads. Developed ACID-compliant financial pipelines and HIPAA-compliant healthcare data solutions using Snowflake, AWS, and Python, while also leveraging Azure and GCP for hybrid-cloud environments.

Overview

13
13
years of professional experience

Work History

Senior Data Engineer

Cardinal Health
Remote
02.2021 - Current
  • Architected and enhanced Snowflake-based healthcare and supply chain data platforms supporting clinical, operational, logistics, and reporting workloads, integrating raw data from APIs, flat files, EDI-style interfaces, cloud object storage, and downstream enterprise systems.
  • Designed ELT pipelines using Snowflake, AWS S3, AWS Glue, Python, and SQL to ingest and transform high-volume healthcare datasets into governed analytical models optimized for regulatory reporting, operational analytics, and executive dashboards.
  • Built clinical medallion-style lakehouse pattern for healthcare data domains, organizing raw HL7/FHIR and operational feeds into bronze, silver, and gold layers, enhancing traceability and data quality for analytics teams.
  • Developed standardization and transformation pipelines for healthcare interoperability data, including HL7-to-FHIR ingestion and normalization, enabling downstream consumers to utilize modern, analytics-friendly structures.
  • Implemented PHI de-identification and masking logic using hashing, field suppression, role-aware access patterns, and generalized attribute transformations to support HIPAA-aligned analytics without exposing sensitive patient identity data.
  • Partnered with business and data stakeholders to improve data provenance and audit trails, capturing source-to-target lineage, transformation history, and operational checkpoints so regulated datasets could be traced back to original systems during audit and compliance reviews.
  • Built automated healthcare data standardization flows supporting clinical terminology normalization and reference mapping concepts aligned with ICD-10, SNOMED-CT, and LOINC-style reporting needs, improving consistency of downstream measures and reporting logic.
  • Designed Snowflake models and transformation logic supporting quality measure aggregation and population-level reporting use cases, enabling business teams to evaluate screening, adherence, utilization, and operational health metrics more efficiently.
  • Tuned Snowflake workloads through warehouse right-sizing, clustering strategy improvements, query rewrites, pruning-aware design, and workload separation, improving performance for large-scale joins, curated marts, and dashboard refresh jobs.
  • Established reusable ingestion and transformation patterns for semi-structured data using JSON, VARIANT columns, Snowflake tasks, streams, and metadata-driven orchestration, reducing manual development effort for new data domains.
  • Enhanced production reliability through implementation of pipeline observability, load validation, data reconciliation checks, and failure alerting, increasing trust in regulated datasets for analysts and operations teams.
  • Supported secure enterprise analytics by applying role-based data access, controlled sharing models, and audit-conscious warehouse design, balancing performance with privacy and compliance requirements.

Senior Data Engineer

Fiserv
Remote
03.2016 - 01.2021
  • Designed and scaled Snowflake-centric financial data platforms supporting transaction analytics, reconciliation, risk reporting, audit workflows, and downstream operational data products in a high-security FinTech environment.
  • Engineered ACID-aware ingestion and transformation patterns for financial datasets, ensuring transaction completeness, deterministic processing, rollback-safe logic, and reconciliation-friendly lineage across sensitive money movement workflows.
  • Built robust ELT pipelines with Snowflake, Python, SQL, dbt, Kafka, and Spark to ingest high-volume transaction, settlement, account, and customer activity data from batch and event-driven sources.
  • Developed real-time and near-real-time streaming ingestion patterns for financial event processing using Kafka and Spark Structured Streaming, enabling faster fraud analysis, event enrichment, and operational monitoring of payment activity.
  • Supported real-time fraud detection scoring workflows by preparing low-latency transaction features and trusted event streams for analytics and downstream decisioning systems.
  • Implemented PCI-DSS tokenization-oriented data handling patterns, replacing or masking sensitive payment card data before landing in accessible analytical environments to enhance data security.
  • Built ledger reconciliation pipelines comparing internal transaction records with external processor, settlement, and downstream reporting feeds to identify mismatches, timing gaps, and duplicate events.
  • Designed transformation layers to support KYC / customer due diligence and risk-oriented analytics, integrating customer profile, transaction, and reference data into curated datasets used for compliance and operational review.
  • Contributed to AML-focused data engineering patterns, including entity relationship preparation and graph-friendly data outputs that could support suspicious activity analysis and multi-hop transaction investigation.
  • Developed parsing and normalization logic for financial messaging and standards-based exchange patterns, including XML-heavy and institution-oriented data interchange concepts aligned with ISO 20022-style message processing.
  • Designed Snowflake dimensional and analytical models for transaction volumes, settlement status, dispute analysis, customer behavior, and operational KPIs, improving reporting performance and streamlining transformation logic in BI layers.
  • Tuned Snowflake performance using warehouse isolation, clustering, query profile analysis, caching-aware SQL design, pruning optimization, and staged transformation decomposition, improving throughput and lowering compute waste.
  • Applied dbt-based transformation patterns to improve modularity, testing, documentation, and version control discipline for SQL models used in heavily audited financial reporting pipelines.
  • Partnered with platform and security teams to strengthen RBAC, encryption-aware design, secrets discipline, and controlled data access, improving compliance posture while maintaining usability for engineering and analytics teams.
  • Integrated Snowflake with broader cloud services and adjacent enterprise tooling, including AWS S3, Glue, Lambda, and selective Azure/Fabric interoperability, to support hybrid reporting and partner-facing analytics workflows.

Data Engineer

Chewy
Onsite
05.2013 - 02.2016
  • Built and maintained large-scale analytics and reporting pipelines for e-commerce, integrating clickstream, order, product, customer, fulfillment, and inventory data into analytical models that enabled data-driven decision-making.
  • Developed customer 360-style data models by merging customer profile, browsing, cart, order, and engagement data across multiple digital touchpoints to improve marketing, personalization, and retention analytics.
  • Engineered clickstream sessionization pipelines that transformed raw web and app events into structured user sessions, enabling funnel analytics, navigation-path analysis, and conversion measurement.
  • Built ingestion and transformation flows supporting near-real-time inventory synchronization, improving alignment between warehouse stock availability and customer-facing inventory signals.
  • Integrated product, competitor, and demand-related data into curated datasets supporting dynamic pricing and merchandising analytics, facilitating more informed pricing strategies and operational optimizations.
  • Developed foundational datasets for recommendation and product affinity analytics, incorporating customer-product interaction features and market-basket relationships to enhance personalization and cross-sell strategies.
  • Built pipelines for A/B testing and experimentation analytics, ensuring test/control traffic could be measured cleanly and tied to conversion, basket size, and behavioral metrics.
  • Implemented SQL and modeling improvements to optimize report performance, reduce duplicate business logic, and improve trust in KPI reporting used by business and operations stakeholders.
  • Worked with cloud and warehouse technologies that laid the groundwork for later Snowflake-centric patterns, including structured data modeling, secure handling of customer data, and scalable analytics engineering practices.

Education

Bachelor of Science - Computer Science

Florida Institute of Technology
Melbourne, FL
05-2013

Skills

  • Snowflake architecture
  • Virtual warehouses
  • Multi-cluster warehouses
  • Micro-partitioning
  • Clustering keys
  • Automatic clustering
  • Time Travel
  • Zero-Copy Cloning
  • Secure data sharing
  • Snowpipe
  • Snowpark
  • Tasks
  • Streams
  • Materialized views
  • External tables
  • Semi-structured data
  • VARIANT
  • Query profile analysis
  • Warehouse sizing
  • Workload isolation
  • Resource monitors
  • Result caching
  • Metadata-driven ELT
  • Data retention
  • Fail-safe
  • Access control
  • Masking policies
  • Row access policies
  • ETL
  • ELT
  • Batch pipelines
  • Streaming pipelines
  • CDC
  • Incremental data loads
  • Full refresh
  • Backfills
  • Schema evolution
  • Data contracts
  • Data alignment
  • Orchestration
  • DAG design
  • Failure recovery
  • Idempotent pipelines
  • Data standardization
  • Ingestion framework design
  • Metadata management
  • Lineage
  • Observability
  • SLA monitoring
  • Data quality validation
  • Reference data management
  • Advanced SQL
  • SQL for Snowflake
  • Python
  • PySpark
  • Spark SQL
  • Scala
  • Shell scripting
  • Dbt
  • Stored procedures
  • UDFs
  • Query tuning
  • Window functions
  • MERGE/UPSERT patterns
  • SCD Type 1/2
  • Partition-aware transformations
  • Modular pipeline design
  • AWS S3
  • AWS Glue
  • Lambda
  • Kinesis
  • EC2
  • IAM
  • CloudWatch
  • Redshift
  • RDS
  • Aurora
  • Azure Data Factory
  • Azure Synapse
  • Microsoft Fabric
  • Fabric Pipelines
  • Fabric Lakehouse
  • Power BI
  • BigQuery
  • Pub/Sub
  • Kafka
  • Kafka Connect
  • Spark Structured Streaming
  • Event-driven architecture
  • Real-time processing
  • Event ingestion
  • Stream enrichment
  • Data delivery guarantees
  • Event replay
  • Stream observability
  • HIPAA
  • PCI-DSS tokenization
  • RBAC
  • Encryption at rest
  • Encryption in transit
  • Audit trails
  • Lineage for audits
  • Data provenance
  • Secret management
  • Compliance data access
  • Dimensional modeling
  • Star schema
  • Snowflake schema
  • Conformed dimensions
  • Semantic layer support
  • Customer 360
  • Financial ledger modeling
  • Healthcare analytics
  • Quality aggregation
  • KPI modeling
  • Report optimization
  • Git
  • CI/CD
  • Jenkins
  • GitHub Actions
  • Terraform
  • CloudFormation
  • Docker
  • Monitoring

Timeline

Senior Data Engineer

Cardinal Health
02.2021 - Current

Senior Data Engineer

Fiserv
03.2016 - 01.2021

Data Engineer

Chewy
05.2013 - 02.2016

Bachelor of Science - Computer Science

Florida Institute of Technology
John Peterson