GenAI Application: Built a Multi-Agent Retrieval-Augmented Generation (RAG) LLM internal ERP app for chatbot, test case, and code generation using LangChain and LangGraph
Refined pre and post processing and prompt based on user query patterns
Full Stack: Developed frontend using React (JavaScript) and Slackbot, seamlessly integrated with FastAPI, Deployed the end-to-end solution on Kubernetes with Argo Workflows, BentoML, and Yatai, delivering a production-ready application that generated 18k Q&A entries for an industry knowledge platform
Data Engineer Intern
Amazon
05.2024 - 07.2024
Data Governance Tools: Developed a scalable, and user-friendly data quality library by Scala that incorporates customized rules of accuracy, completeness, and consistency for tracking for 120+ feature datasets from Andes, S3, and DynamoDB, using Sandbox and JUnit for QA process, for anomalous data discoverability, tagging, and classification.
Software Engineer
PingAn Technology
08.2022 - 08.2023
Data Asset Catalog: Successfully integrated Hive/Spark data sources into a meta store orchestration system within a Postgres database, significantly improving data consistency, governance, and accessibility across the internal platform
Automated Data Modelling Framework: Developed a RESTful API-based automated SQL generator, streamlining the quick maintenance of ETL pipelines
Transformed over 300 standardized models into Spark format and contributed to the development, testing, and documentation of 170+ extensive Java UDF APIs to enhance pipeline efficiency
Optimized 70+ UDF APIs by utilizing multi-threading to increase concurrency, achieving a 55% reduction in time and a 72% reduction in resource across 200k testing nodes
Task Performance Monitoring Framework: Designed an operation workbench across 3k+ jobs for task monitoring with timeout and status checking features and developed RESTful APIs for task management using SpringBoot, JPA, and Quartz in Java, and facilitated visualization through Apache Grafana and Prometheus
Distributed System Workload Optimization: Optimized Kafka and Flink by adjusting RocksDB cache settings, task parallelism, and implementing a custom RateLimiter to stabilize pipelines
Tuned Spark internals, resolving data skew with hash-based partitioning and optimizing key operators, improving performance and resource usage by 30%
Deployment & DevOps: Managed SDK and HTTP API compatibility for over 150k nodes, developing scripts for product lifecycle management on Linux
Software Engineer Intern
Alibaba Group - Fliggy
05.2021 - 07.2021
Associated with developing an ad Multi-Touch Attribution Analysis for Advertising attribution framework using Java, SQL, Spark, enabling precise measurement of ad effectiveness across channels
Processed 10M+ user interactions by classifying events into ad exposures, clicks, and conversions, linking them via session-based and ID-based association rules in data lakes
Implemented attribution models (last-touch, first-touch, linear) to distribute conversion credit, and optimized aggregation pipelines using Spark & SQL & Kafka, ensuring real-time ad performance tracking
Education
Master of Science - Computer Science
Northeastern University
Seattle, WA
05.2025
Master of Science - Information Management Systems
New York University
12.2021
Skills
Java, Scala, Python, SQL, JavaScript, Bash
Airflow, Hadoop, Spark, Flink, Kafka
MySQL, Postgres, Redis
Docker, Kubernetes, Git, Linux
Timeline
Software Engineer Intern
Nvidia
09.2024 - 12.2024
Data Engineer Intern
Amazon
05.2024 - 07.2024
Software Engineer
PingAn Technology
08.2022 - 08.2023
Software Engineer Intern
Alibaba Group - Fliggy
05.2021 - 07.2021
Master of Science - Computer Science
Northeastern University
Master of Science - Information Management Systems