Summary
Overview
Work History
Education
Publications
Patents
Open Source Projects
Timeline
Generic
Erik Gafni

Erik Gafni

Machine Learning
San Francisco,CA

Summary

  • I care deeply about AI and making sure it impacts the world in as positive a way as possible.
  • I have 14 years of research engineering experience with python, and I love solving meaningful problems that have a profound impact on people. I have written foundational production machine learning systems that has helped millions of people, a company IPO, and led to patents and publications.
  • In a past life I played chess (44th at nationals in 5th grade)

Overview

14
14
years of professional experience
7
7
years of post-secondary education

Work History

Co-founder, Head of AI

Eventum.ai
07.2021 - Current

Recruited and led a team of ~7 machine learning research engineers building various machine learning models and software in LLMs (Nouns, largest DAO and finance), computer vision (DeepCell $70M+), generative vision (founders of MySpace), reinforcement learning and time-series (finance), audio/voice (Sanas.ai $50M+), MLOps, Cloud Infra, etc.

I am very hands-on with code.

Machine Learning Advisor

PlaiDay
11.2022 - Current

Helping the founders of MySpace with generative AI

Co-Founder, Principal Machine Learning Scientist

Ravel Biotechnology
10.2018 - 03.2021

• Applied computer vision, NLP, and audio neural network architectures to different types of high throughput genetic sequencing data

• Wrote R&D and production software infrastructure for MLOps, cloud, and data pre-processing
• Created a early detection cancer-screening test
• Research results led to $9.5M in funding
• I led a collaboration with the University of Missouri which resulted in a Nature publication.

Machine Learning Research Consultant

Self-Employed
03.2018 - 10.2018

Machine Learning algorithm development for Invitae's cell-free DNA non invasive prenatal screening test which was deployed to millions of patients.

Senior Computational Biologist - Machine Learning

Freenome
10.2017 - 10.2018

• Machine learning for the early detection of colorectal cancer from multiple analytes in the blood.
• ~30th employee at a start that went on to raise around $1B dollars in VC.

Senior Bioinformatics Engineer

Invitae
06.2013 - 10.2017

• Developed and trained probabilistic, machine learning, and bioinformatics methods and production software to process petabytes of clinical genomic data into predictions used for clinical genetic testing reports. Invitae was one of the first biotech diagnostic companies to deploy a machine learning algorithm into production.
• ~30th employee at a startup that IPOed a few years later ($6.5B+)
• Lead development of Invitae’s clinical production variant calling pipeline.
• Author of 2 of Invitae's foundational patents.
• Author of the open source workflow manager that Invitae's pipelines are written in, Cosmos. The library is used by various genomics group around the world to do scientific distributed computing.

Senior Research Associate

Harvard Medical School
09.2010 - 05.2013

• Applied Bioinformatics and Machine Learning methods to NGS genomic data and autism data
• Developed generative probabilistic graphical models (PGMs) for clinical trial simulations
• Worked with one of the first groups (headed by Dr Tim Yu from Boston's Children Hospital) to ever do a clinical exome sequencing and interpretation to end a patient's diagnostic odyssey.
• Wrote an open source bioinformatics workflow manager still used by academic and commercial labs around the world.

Education

Bachelors of Health Science - Physiology

University of Arizona
01.2003 - 05.2008

Bachelor of Science - Computer Science

University of Arizona
01.2009 - 05.2010

Publications


  • Evaluation of cfDNA as an early detection assay for dense tissue breast cancer. Nature · May 19, 2022. Third Author.
  • Abstract 2105: Cell-free DNA fragments inform epigenomic mechanisms for early detection of breast cancer (Using Deep Learning). Apr 10, 2021. Cancer Research. First Author.
  • Abstract 4349: Predicting gene expression from plasma cell-free DNA using both the fragment length and fragment position (Using Deep Learning). Jul 1, 2019 AACR. Second Author
  • Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNAM. Jan 1, 2019 BMC Cancer. Middle Author.
  • Early Stage Colorectal Cancer Detection Using Artificial Intelligence and Whole-Genome Sequencing of Cell-Free DNA in a Retrospective Cohort of 1,040 Patients. American Journal of Gastroenterology. Middle author.
  • Quantitative Determination of SMN2 Copy Number using Next Generation Sequencing and Correlation to Disease Severity (S5.002). Neurology · Jan 1, 2018. Middle author.
  • COSMOS: cloud enabled NGS analysis. Jan 28, 2015. BMC. Middle Author
  • COSMOS: Python library for massively parallel workflows. Bioinformatics · Jun 30, 2014. First Author.
  • TRANSCRIPTIONAL SUBCLASSES FROM PRIMARY HUMAN GLIOBLASTOMA MULTIFORME CELL LINES DEMONSTRATE PROGNOSTIC VALUE. Neuro-Oncology · Jan 1, 2012. Middle Author.
  • Biomedical Cloud Computing With Amazon Web Services. Plos Computational Biology · Aug 25, 2011. Middle Author.


Patents

  • Systems and Processes of Identifying Genetic Variations Systems and Processes of Identifying Genetic Variations US USSN 15/711,760 · Filed Sep 21, 2017US USSN 15/711,760 · Filed Sep 21, 2017. This is Invitae's Variant Calling pipeline, for which I was first author.
  • METHODS, SYSTEMS AND PROCESSES OF IDENTIFYING GENETIC VARIATION IN HIGHLY SIMILAR GENESMETHODS, SYSTEMS AND PROCESSES OF IDENTIFYING GENETIC VARIATION IN HIGHLY SIMILAR GENESUS 20160300014 · Issued Oct 13, 2016. This allowed Invitae to call variants in highly paralogous genes such as PMS2. Second of two authors.

Open Source Projects

  • COSMOS2 - A scientific workflow management system for distributed computing. Still used by Invitae to process millions of clinical samples per year, as well as many other companies and academic groups. https://github.com/Mizzou-CBMI/COSMOS2.
  • Implementations of Reinforcement Learning algorithms. https://github.com/egafni/ReinforcementLearning

Timeline

Machine Learning Advisor

PlaiDay
11.2022 - Current

Co-founder, Head of AI

Eventum.ai
07.2021 - Current

Co-Founder, Principal Machine Learning Scientist

Ravel Biotechnology
10.2018 - 03.2021

Machine Learning Research Consultant

Self-Employed
03.2018 - 10.2018

Senior Computational Biologist - Machine Learning

Freenome
10.2017 - 10.2018

Senior Bioinformatics Engineer

Invitae
06.2013 - 10.2017

Senior Research Associate

Harvard Medical School
09.2010 - 05.2013

Bachelor of Science - Computer Science

University of Arizona
01.2009 - 05.2010

Bachelors of Health Science - Physiology

University of Arizona
01.2003 - 05.2008
Erik GafniMachine Learning