Close collaboration with business and engineering teams on data architecture projects for global companies in industries including medicine, entertainment, and sports.
Focus on automation and engineering productivity of data processing, data quality, data modeling, data analysis, and metadata management on systems including data lake, data warehouse, data mart, OLTP, OLAP, and BCG DDP
Migrate legacy data systems transformation onto modern, cloud-based platforms, including SQL and spark scripts refactoring and optimization.
Built various crawlers that continuously collect data from 20 websites
Techs: python, linux, rabbitmq, aws
Education
Nike Inc
Text-to-SQL Service
2024/10 - Current
Nike Inc.
Business-oriented Low-code ETL Framework
2022/07 - Current
Nike Inc
Tableau Dev Standards
2023/07 - 2024/05
Personal Trade Bot
2024/06 - 2024/07
Certification
01/21, AWS Solutions Architect - Professional
Enterpriseproducts
03/22 - present, ETL Pipeline Generator, Nike Inc, Shenzhen, China, Design and implemented a config-based ETL pipeline generator that serves to reduce ETL maintenance effort and improve model explainability for bot detection domain experts., Achieved separation of business logics from ETL implementation, which has reduced 90% maintained code., Developed a devops automation tool that transpiles and deploys the code to Azure Databricks within 3 seconds., Python, PySpark, Snowflake, Hive, Airflow, Docker, Event Driven Design
10/20 - 10/20, Data Collaboration Framework, Boston Consulting Group, New York, NY, Individually designed and implemented a multi-cloud-provider, abstract framework for developing complex ETL, data quality testing, and data analysis, which resolved the critical issue of engineering collaboration with the client team., The framework runs on top of data lake and data warehouse., Azure Databricks, ADLS, Pyspark, Python, Domain Driven Design (DDD), Event Driven Design, Docker
04/20 - 07/20, EMR-based ETL Framework, Boston Consulting Group, New York, NY, Implemented several features to enhance an Spark-based ETL Framework used by the whole DE department., Tackled the Spark-Redshift slow-writing issue, which further enabled the DE department to work with Redshift., Presto, Snowflake, MySQL, SQL Server, PostgreSQL, AWS EMR, Redshift, CloudWatch, CloudFormation, Dynamo, RDS, S3, Python, Terraform, PySpark, Docker
10/19 - 12/19, Data Architecture Migration, Boston Consulting Group, New York, NY, Took lead in migrating 30+ individual AWS Glue scripts to modularized, organized PySpark code base, which enabled the client on a major market campaign based on customer identity data merged from 3 data systems., Refactoring the code and completing the migration had reduced $26k AWS costs each year for the client., Presto, SQL, AWS EMR, S3, Redshift, Glue, Athena, Dynamo, RDS, CloudWatch, Python, Docker
2020 - 2020, CRM System Enhancement, Boston Consulting Group, New York, NY, Expanded the capability of a product-based CRM system to be able to sell business services., Conduct conversation directly with the non-technical client team to discuss technical requirements., SQL Server, Javascript, Docker
10/17 - 08/18, Framework for developing distributed data pipelines, Myers Media Group LLC, San Diego, CA, Took lead in implementing an event-driven, abstract framework for developing distributed data crawlers and ETL pipelines, which runs on data lake and data warehouse., Achieved 99% prevention of data duplication., Python, Django, PostgreSQL, AWS SQS, ElasticBeanstalk, ECS, Athena, Glue, SNS, Javascript, Event Driven Design, Docker