Summary
Overview
Work History
Education
Skills
Academic Project
Timeline
Generic

YI LI

Shanghai,China

Summary

5 years of experience in the data field, with a strong foundation in statistics, machine learning, and data analysis. Proven expertise in designing, developing, and optimizing data warehouse solutions. Skilled in ETL processes, database management, and data modeling. Proficient in programming languages including Python, and SQL. Adept at data visualization using tools such as Tableau and Power BI. Seeking an opportunity to leverage analytical skills and contribute to data-driven decision-making in a dynamic organization.

Overview

6
6
years of professional experience

Work History

Business Systems Analyst

PayPal - BEYOND TECHNOLOGY
Shanghai
05.2024 - Current
  • Works in PayPal CIPDS (Compliance Insights Platform and Data Solution), responsible for designing and optimizing compliance data platforms to support global compliance operations with data integration, analysis, and visualization needs.
  • collaborate with Compliance stakeholders to understand any data needs and objectives
  • conduct gap analyses to identify and address inconsistencies or deficiencies in current data systems
  • analyze and design data integration processes to ensure seamless data flow between front-end tools and back-end databases.
  • develop data mapping and transformation rules for data migration or integration projects.
  • design and document data models, including data schemas, relational databases, and data warehouse for Compliance platform.
  • monitor and improve data quality, ensure accuracy, consistency and completeness.
  • coordinate with cross-function teams including data product owners, data engineers and business teams to ensure project success
  • work with developers to implement and test data systems and solutions.

Data Engineer

Small Business Administration, SBA
Washington, DC
10.2019 - 01.2023
  • Supported SBA's Micro-loan project, Customer Service Hub, and Certify System.
  • Designed and developed scalable data solutions to support enterprise-level business analytics and reporting needs.
  • Built robust ETL pipelines to ingest over billions of business and customer data from external source system into Azure SQL Database.
  • Developed and optimized stored procedures and user-defined functions to clean, transform, and normalize large datasets; conducted exploratory data analysis (EDA) to identify trends and data quality issues.
  • Automated repetitive data processing tasks using Python, significantly reducing manual effort and increasing operational efficiency.
  • Designed an interactive dashboard in Power BI to visualize the distribution of a $300 million COVID-19 relief plan, including calculations for payments to lenders—resulting in a 50% reduction in labor costs.
  • Built machine learning models in Python to predict loan default rates. Gathered, cleaned, and engineered features from various sources, including borrower profiles, loan terms, and historical default data.

Education

Bachelor of Science - Data Science

George Washington University
Washington, DC
05-2019

Bachelor of Science - Applied Mathematics And Statistics

Stony Brook University
Stony Brook, NY
05-2017

Skills

  • Programming: Python, SQL, R, PowerQuery, DAX
  • Platforms: Azure, AWS, Google Cloud
  • machine learning algorithms, parameter tuning, Exploratory data analysis
  • Power BI, Tableau

Academic Project

Movie Recommendation Engine Development in Apache Spark

  • Built data ETL pipeline to analyze movie rating dataset and conducted online analytical processing (OLAP) with Spark SQL
  • Implemented the Alternative Least Sqaure model to provide personalized movie recommendations and devleoped user-based approaches to handle system cold-start problems
  • Performed model hyper-parameters tuning with Spark ML cross-evaluation toolbox and monitored data processing via Spark UI on AWS

Petfinder.my Animal Adoption Speed Prediction

  • Developed algorithms to predict the animal adoption speed based on labeled data via Python programming
  • Performed Exploratory Data Analysis and visualization in Python
  • Trained machine learning models including XGBoost, LightGBM, Back Propagation Neural Network, BP network obtained the best testing Kappa = 0.6466
  • Published the project on github.io (https://liyi61.github.io/PetFinder.my-Adoption-Speed-Prediction)

Timeline

Business Systems Analyst

PayPal - BEYOND TECHNOLOGY
05.2024 - Current

Data Engineer

Small Business Administration, SBA
10.2019 - 01.2023

Bachelor of Science - Data Science

George Washington University

Bachelor of Science - Applied Mathematics And Statistics

Stony Brook University
YI LI