Committed to improve the clinical outcome of cancer patients by developing computational tools & methods for genomic & pathological data.
1. Developed an accelerated somatic mutation calling for WES and WGS data.
2. Developed a biomarker, called TmS, by integrating DNAseq and RNAseq data from the same tumor; it is associated with patient's survival in 14 cancer types.
3. Developed a snakemake based somatic mutation calling pipeline, making it faster for cancer genetic data analysis.
4. Participated in multiple projects that involve various cancer types, including young onset colorectal cancer, esophageal and gastric metastasis cancers, and discovered novel biomarkers characterizing these cancers at different conditions.
I was working as the team leader of the algorithm development for natural language processing. We used deep convolutional networks, recurrent networks (e.g. LSTM) to classify documents, extract abstract from documents and identify entities (including company name, person name and address).
Genetic variants detection from next-generation sequencing data
Bulk RNA sequencing and single cell RNA sequencing data analysis
High performance computing
Statistical modelling & deep learning modelling
Histopathological data analysis
Protein structure prediction & protein-protein interaction prediction
Proficient in Python, PyTorch, Tensorflow and Keras, Matlab, C/C++, Java, R (including ggplot), MySQL and Linux shell script.
1. Ji, S., Zhu, T., Sethia, A. & Wang, W. Accelerated somatic mutation calling for whole-genome and whole-exome sequencing data from heterogenous tumor samples. bioRxiv. 2023 (under review).
2. Ji, S., Zhu, T., Sethia, A., Montierth, M. D. & Wang, W. Abstract 2070: Accelerated somatic mutation calling tool for whole-genome and whole-exome sequencing data from heterogenous tumor samples. Cancer Res. 83, 2070 (2023).
3. Jiang, Y., Yu, K., Montierth, M. D., Ji, S., Shin, S. J., Guo, S., et al. Abstract 4272: Pan-cancer analysis of intra-tumor heterogeneity in 9,116 cancers using a novel regularized likelihood model. Cancer Res. 83, 4272 (2023).
4. Guo, S., Cheng, X., Koval, A., Ji, S., Liang, Q., Li, Y., et al. Abstract 4273: Integration with benchmark data of paired bulk and single-cell RNA sequencing data substantially improves the accuracy of bulk tissue deconvolution. Cancer Res. 83, 4273 (2023).
5. Cao, S.*, Wang, J. R.*, Ji, S.*, Yang, P., Dai, Y., Guo, S., et al. Estimation of tumor cell total mRNA expression in 15 cancer types predicts disease progression. Nat. Biotechnol. 1–10 (2022).
*These authors contribute equally.
6. Ji, S., Montierth, M. D. . & Wang, W. MuSE: A Novel Approach to Mutation Calling with Sample-Specific Error Modeling. Variant Calling. 21–27 (Springer, 2022).
7. Jiang, Y., Yu, K., Ji, S., et al. (2021). CliP: subclonal architecture reconstruction of cancer cells in DNA sequencing data using a penalized likelihood model. bioRxiv. 2021.
8. Maniakas, A., Henderson, Y.C., Hei, H., Peng, S., Chen, Y., Jiang, Y., Ji, S., et al. Novel Anaplastic Thyroid Cancer PDXs and Cell Lines: Expanding Preclinical Models of Genetic Diversity. The Journal of Clinical Endocrinology & Metabolism. 2021; 106(11): e4652–e4665. doi: 10.1210/clinem/dgab453.
9. Rajasekar, K.V., Ji, S., Coulthard, R.J., et al. Structure of SPH (self-incompatibility protein homologue) proteins: A widespread family of small, highly stable, secreted proteins. Biochemical Journal. 2019; 476(5): 809–826. doi: 10.1042/BCJ20180828.
10. Ji, S., Oruç, T., Mead, L., et al. DeepCDpred: Inter-residue distance and contact prediction for improved prediction of protein structure. PLoS One. 2019;14(1):302. doi:10.1371/journal.pone.0205214
11. Huang, X., Sun, L., Ji, S., et al. Kissing and nanotunneling mediate intermitochondrial communication in the heart. Proc Natl Acad Sci U S A. 2013;110(8):2846-2851.
12. Ji, S., Ye, C., Li, F., et al. Automatic segmentation of white matter hyperintensities by an extended FitzHugh & nagumo reaction diffusion model. J Magn Reson Imaging. 2013;37(2):343-350.
13. Li, B., Dong, L., Chen, B., Ji, S., et al. Turbo fast three-dimensional carotid artery black-blood MRI by combining three-dimensional MERGE sequence with compressed sensing. Magn Reson Med. 2013;70(5):1347-1352.
1. Frontier in Oncology (2022 - Present)
2. Frontier in Genetics (2020 - Present)
3. Frontiers in Surgery (2023 - Present)
4. The 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB) (2023)
5. Human Gene (2023 - Present)
4/2015 - 7/2015 Supervised master student Jane Lin on her thesis for three months. University of Birmingham, UK.
4/2016 - 7/2016 Supervised master student Simon Lim on the algorithm development for protein domain identification for three months. University of Birmingham, UK.
10/2016 - 10/2017 Supervised PhD student Tugce Oruc on the algorithm development of deep learning based amino acid contact and distance prediction. University of Birmingham, UK.
10/2019 Tutorial: How to use cellxgene for single cell RNA seq data visualization (https://github.com/chanzuckerberg/cellxgene). MD Anderson Cancer Center, Houston, Texas.
7/2017 Presentation: Deep learning based amino acid contact prediction. the Biosciences Graduate Research School symposium. Birmingham, UK.
10/2016 Tutorial: How to use Rosetta for protein structure simulation. University of Birmingham, UK.
10/2016 Tutorial: How to use the Markov Random Field model for amino acid contact prediction. University of Birmingham, UK.
4/2023 AACR 2023, Orlando, USA. Poster
3/2023 NCI SSACB (Spring School on Algorithmic Cancer Biology), Bethesda, USA.
7/2022 CGSI (Computational Genomics Summer Institute), Los Angeles, USA. Poster
5/2022 RECOMB (Research in Computational Molecular Biology), San Diego, USA. Poster
7/2016 ISBC (International Society for Computational Biology), Dublin, Ireland. Poster
5/2011 ISMRM (International Society for Magnetic Resonance in Medicine), Montreal, Canada. Poster