Emergence of pathway-level composite biomarkers from converging gene set signals of heterogeneous transcriptomic responses

Samir Rachid Zaim, Qike Li, and A. Grant Schissler

Ctr for Biomed. Informatics & Biostatistics, Dept of Medicine, Grad. Interdisciplinary Prog. in Statist.,

The University of Arizona, 1657 E. Helen Street, Tucson, AZ, 85721, USA

Email: samirrachidzaim@email.arizona.edu, qikeli@email.arizona.edu, grant.schissler@gmail.com

 

­ Yves A. Lussier

Center for Biomedical Informatics & Biostatistics, Dept of Medicine, Cancer Center, BIO5 Institute,      

The University of Arizona, 1657 E. Helen Street, Tucson, AZ, 85721, USA

Email: yves@email.arizona.edu

Recent precision medicine initiatives have led to the expectation of improved clinical decision-making anchored in genomic data science. However, over the last decade, only a handful of new single-gene product biomarkers have been translated to clinical practice (FDA approved) in spite of considerable discovery efforts deployed and a plethora of transcriptomes available in the Gene Expression Omnibus. With this modest outcome of current approaches in mind, we developed a pilot simulation study to demonstrate the untapped benefits of developing disease detection methods for cases where the true signal lies at the pathway level, even if the pathway’s gene expression alterations may be heterogeneous across patients. In other words, we relaxed the cross-patient homogeneity assumption from the transcript level (cohort assumptions of deregulated gene expression) to the pathway level (assumptions of deregulated pathway expression). Furthermore, we have expanded previous single-subject (SS) methods into cohort analyses to illustrate the benefit of accounting for an individual’s variability in cohort scenarios. We compare SS and cohort-based (CB) techniques under 54 distinct scenarios, each with 1,000 simulations, to demonstrate that the emergence of a pathway-level signal occurs through the summative effect of its altered gene expression, heterogeneous across patients. Studied variables include pathway gene set size, fraction of expressed gene responsive within gene set, fraction of expressed gene responsive up- vs down-regulated, and cohort size. We demonstrated that our SS approach was uniquely suited to detect signals in heterogeneous populations in which individuals have varying levels of baseline risks that are simultaneously confounded by patient-specific “genome -by- environment” interactions (G×E). Area under the precision-recall curve of the SS approach far surpassed that of the CB (1st quartile, median, 3rd quartile: SS = 0.94, 0.96, 0.99; CB= 0.50, 0.52, 0.65). We conclude that single-subject pathway detection methods are uniquely suited for consistently detecting pathway dysregulation by the inclusion of a patient’s individual variability.

http://www.lussiergroup.org/publications/PathwayMarker/

      Keywords: pathway, gene set, biomarkers, single-subject, cohort, precision medicine, kMEn, n-of-1

 

Supplements

README
evaluation.R
simulation.R
SimulateRNASeqV4_SRZ.R