Experienced Software Engineer with over 8 years of hands-on expertise in Scala, Apache Spark, and Databricks, specializing in large-scale data engineering.
Designed and deployed real-time and batch ETL pipelines across finance, aviation, and cybersecurity domains.
Proficient in Spark Structured Streaming, Delta Lake, and Spark SQL for building fault-tolerant and high-throughput workflows.
Developed and operationalized machine learning pipelines using Spark MLlib, MLflow, and feature engineering strategies.
Integrated GenAI use cases with Spark-parsed inputs and RAG pipelines for financial document summarization and querying.
Strong background in AWS cloud-native architectures, leveraging Lambda, EventBridge, and Kinesis for event-driven design.
Implemented behavioral authentication models at BlackBerry using weight-based scoring and Squeezer algorithms in Spark.
Optimized Spark jobs using advanced partitioning, caching, and tuning techniques, reducing runtimes by up to 60%.
Skilled in developing secure backend APIs using Java, Spring Boot, and GraphQL to support analytical and ML applications.
Recognized guest speaker and mentor at UT Dallas, guiding students on real-world data science applications in enterprise systems.
Overview
9
9
years of professional experience
Work History
Software Engineer – Data Science & Engineering
PNC Bank
Dallas, TX
12.2024 - Current
Developed scalable ETL and ML feature pipelines using Scala, Spark, and Databricks to support fraud detection and credit modeling systems.
Built Spark-based RAG data preprocessing pipelines to support retrieval-grounded GPT workflows, enriching financial document search capabilities.
Partnered with ML and AI teams to deploy Spark MLlib models for transaction classification, integrating outputs with downstream Delta Lake stores.
Tuned Spark workloads for 10x faster execution by optimizing joins, partitioning strategies, and job shuffling behavior.
Used MLflow to track experiments and register models, automating integration into real-time scoring APIs.
Developed and orchestrated pipelines with Databricks Jobs and Airflow, ensuring SLA-bound data delivery and visibility.
Created validation layers using Spark UDFs for cleansing and deduplicating multi-source financial data.
Automated GenAI chatbot integration for customer support scenarios by exposing GPT responses generated from Spark-parsed documents.
Deployed prompt-testing frameworks to measure accuracy, drift, and hallucination rates of GPT responses tied to Spark-processed datasets.
Built notebooks for ad hoc analytics and troubleshooting of model drift, using Spark SQL on Delta tables.
Software Engineer - Flight Planning
Southwest Airlines
Dallas, TX
02.2022 - 11.2024
Implemented event-driven systems for flight planning using AWS Lambda, EventBridge, SQS, and SNS to handle real-time flight operation workflows.
Processed and reacted to Deferred Maintenance Incidents (DMI) events to adjust flight planning logic based on aircraft airworthiness constraints.
Developed functionality to automatically generate and submit flight plans to the FAA, ensuring alignment with compliance and routing protocols.
Integrated internal services with FlightKeys, enabling optimal route calculation and aircraft performance-based planning.
Built and enhanced backend microservices using Java and Spring Boot, supporting key services in flight planning pipelines.
Created and consumed REST and GraphQL APIs to enable secure and real-time communication between planning modules and operational systems.
Used Kafka and IBM MQ for asynchronous message passing and orchestration between microservices tied to aircraft events and route generation.
Contributed to infrastructure automation using CloudFormation, ensuring consistent provisioning and deployment across environments.
Developed interactive front-end tools using Angular, TypeScript, and RxJS to visualize and edit flight data and FAA submission status.
Enabled auditing and traceability of flight plans by integrating event metadata into storage and logging layers.
Participated in agile ceremonies, backlog grooming, and cross-team planning sessions to iterate on flight dispatch features and system improvements.
Performed functional testing and debugging of end-to-end flight planning scenarios, including edge cases like reroutes due to unresolved DMIs.
Software Engineer – Data Science
JPMorgan Chase & Co.
Plano, TX
02.2019 - 10.2021
Designed and built real-time and batch ETL pipelines using Apache Spark (Scala) to process billions of payment records daily across merchant services.
Engineered feature extraction and transformation pipelines within Databricks for fraud detection and transaction scoring models.
Integrated Delta Lake for unified batch + streaming workflows, enabling auditability, rollback, and ACID transactions for compliance datasets.
Partnered with data scientists to implement Spark MLlib pipelines for user behavior clustering and risk scoring — tracked using MLflow.
Created Spark-based processing layers for financial documents used in GenAI summarization experiments with OpenAI models.
Reduced pipeline runtime by over 60% through memory tuning, job parallelization, and caching of intermediate transformations.
Developed schema validation and outlier detection modules using Spark UDFs to ensure data quality before model scoring.
Built modular components for ingestion from Kafka, transformation in Spark, and output to Cassandra and S3-based Delta Lakes.
Authored Databricks notebooks and job templates for reusable, scalable ETL workflows with integrated lineage tracking.
Introduced RAG pipeline experiments using Spark-parsed summaries + vector stores to enable natural language access to high-volume payment logs.
Data Engineer / Scientist
BlackBerry
Irving, TX
05.2016 - 12.2018
Developed real-time behavioral authentication algorithms using Scala and Apache Spark to detect anomalous user access patterns across mobile devices.
Implemented a Squeezer algorithm-based authentication model leveraging user interaction vectors (typing speed, app usage rhythm, device tilt) to generate challenge scores.
Built weight-based challenge-response models in Spark MLlib to calculate probabilistic identity scores from biometric and device-based signals.
Designed Spark pipelines to process terabytes of sensor and event data for model training, feature selection, and scoring in Databricks.
Tuned model features using correlation filtering, PCA, and Spark UDFs for high-dimensional behavioral vectors.
Integrated anomaly scores with BlackBerry's security engine to trigger adaptive authentication or escalation workflows.
Created an end-to-end Databricks MLflow pipeline to track authentication model versions, accuracy trends, and feature evolution.
Conducted statistical validation and model drift analysis using Spark SQL and Databricks visualizations.
Collaborated with mobile app teams to embed lightweight scoring agents and feedback loops for continuous model updates.
Reduced false positives by 40% over traditional rules-based systems by introducing streaming model inputs and feedback loops.
Education
Bachelor of Science - Computer Science
The University of Texas At Dallas
Richardson, TX
12.2018
Skills
Category Technologies / Tools / Concepts
Big Data & ETL Apache Spark, Databricks, Delta Lake, Spark SQL, Apache Kafka, Airflow, Hive, Cassandra
Data Engineering ETL Pipelines, Data Ingestion, Data Transformation, Feature Engineering, Streaming & Batch Jobs
2018 Fall Dean's List - Erik Johnson School of Engineering and Computer Science Jul 2018
For the fall 2017 semester, 1,579 undergraduate students made the dean's list at The University of Texas at Dallas. The dean's list is published by the University's Office of Undergraduate Education at the conclusion of each fall and spring semester. It contains the names of students who completed at least 12 credit hours during the semester with a grade-point average among the top 10 percent of all students within their respective schools. The students are listed below in accordance with student requests under the Family Educational Rights and Privacy Act.
Guest Speaker – Machine Learning Special Topics in Computer Science Erik Jonsson School of Engineering and Computer Science, UT Dallas — June 2018
Invited by faculty to guest lecture and mentor students on real-world machine learning applications as part of a special topics course.
Shared hands-on experience as a Data Scientist at BlackBerry, covering behavioral authentication models, Spark-based pipelines, and ML deployment workflows.
Assisted the professor in guiding students through industry use cases, technical challenges, and career pathways in data science and AI.
Actively mentored students during Q&A and follow-up sessions, focusing on applied Scala, Spark, and Databricks workflows in cybersecurity contexts.
International Trade Specialist at PNC Bank, Pittsburgh National Corporation BankInternational Trade Specialist at PNC Bank, Pittsburgh National Corporation Bank