Acted as the DRI for more than 30 datasets to tracking user behavior, content engagement, and personalization metrics for Apple TV+, Fitness+ and Edu. As a DRI, led the development of critical projects, including MLS launch, Fitness+ Redesign, and Edu Redesign, providing actionable insights for stakeholders.
Designed and developed multiple offline data pipelines, utilizing Spark on a scalable data platform to optimize data flow, processing, and analysis. This led to significant improvements in performance, reliability, and insight generation.
Collaborated in designing and developing core modules of a video engagement streaming pipeline with Apache Flink as part of an ongoing project, which will improve real-time analytics capabilities and optimize the resource.
Developed stream ingestion pipelines for Apple Edu, managing schema evolution and data rotation on top of Kafka, ensuring seamless integration with evolving data requirements.
Developed UC/HMS Utils APIs for simplified access to UC/HMS DDL functions, including database and table creation, partition management, and schema configuration.
Acted as the DRI for UC/HMS tables cross-cluster sync to ensure data consistency and synchronization across clusters for multiple teams.
Lead the development of an E2E testing tool for Fitness+ pipelines, preparing environments via Kubernetes and Sandbox containers. Automated test data generation via Kafka, with storage in HDFS and Cassandra.
software engineer intern
Salesforce
Bellevue
09.2020 - 12.2020
Developed a vault plugin, which could generate different type of secret key for the cloud infrastructure.
Developed a slack bot to auto reply the customer's question.
software engineer intern
Apple
Cupertino
06.2020 - 08.2020
Design and developed a auto script tool to run spark jobs and parse the job running resource information in cluster.
Built a machine learning model to estimate and update Spark job memory usage during runtime. This model served as the POC for the RAS service, which helps team save over 50% in memory.
software engineer
BOSCH
Suzhou
02.2017 - 12.2017
Developed and optimized algorithm in radar sub-system in Level2 and Level3 autonomous driving functions to analyze vehicle surrounding environment. Mainly used sensor fusion and redundant check in software logic to improve the object detection accuracy and to fulfill the functional safety requirements.
Implemented data analysis tools using python to analyze real world endurance run data (>10k hours video data per project), which optimized the efficiency of labeling specific objects in Radar software sub-system.
Education
M.S. - Computer Science
Northeastern University
Seattle, WA
05.2021
M.S. - Automotive Engineering
University of Bath
Bath, United Kingdom
12.2016
B.S. - Automotive Engineering
Wuhan University of Technology
Wuhan, China
06.2015
Skills
Programming languages: Scala, Java, Python, C/C
Big data technologies: Hadoop, Spark, Flink, Kafka