Emergence of pathway-level composite
biomarkers from converging gene set signals of heterogeneous transcriptomic
responses
Samir Rachid Zaim, Qike Li, and A.
Grant Schissler
Ctr for Biomed. Informatics & Biostatistics, Dept of
Medicine, Grad. Interdisciplinary Prog. in Statist.,
The University of Arizona, 1657 E. Helen Street, Tucson, AZ,
85721, USA
Email:
samirrachidzaim@email.arizona.edu,
qikeli@email.arizona.edu, grant.schissler@gmail.com
Yves A. Lussier
Center
for Biomedical Informatics & Biostatistics, Dept of Medicine, Cancer
Center, BIO5 Institute,
The
University of Arizona, 1657 E. Helen Street, Tucson, AZ, 85721, USA
Email:
yves@email.arizona.edu
Recent
precision medicine initiatives have led to the expectation of improved clinical
decision-making anchored in genomic data science.
However, over
the last decade, only a handful of new single-gene product biomarkers have been
translated to clinical practice (FDA approved) in spite of considerable discovery
efforts deployed and a plethora of transcriptomes available in the Gene
Expression Omnibus. With this modest outcome of current
approaches in mind, we developed a pilot simulation study to demonstrate the
untapped benefits of developing disease detection methods for cases where the
true signal lies at the pathway level, even if the pathway’s gene expression alterations
may be heterogeneous across patients. In other words, we relaxed the
cross-patient homogeneity assumption from the transcript level (cohort
assumptions of deregulated gene expression) to the pathway level (assumptions
of deregulated pathway expression). Furthermore, we have expanded previous single-subject (SS) methods into cohort analyses to illustrate the benefit of
accounting for an individual’s variability in cohort scenarios. We compare SS
and cohort-based (CB) techniques under 54 distinct
scenarios, each with 1,000 simulations, to demonstrate that the emergence of a
pathway-level signal occurs through the summative effect of its altered gene
expression, heterogeneous across patients. Studied variables include pathway
gene set size, fraction of expressed gene responsive within gene set, fraction
of expressed gene responsive up- vs down-regulated, and cohort size. We demonstrated that our SS approach
was uniquely suited to detect signals in heterogeneous populations in which
individuals have varying levels of baseline risks that are simultaneously
confounded by patient-specific “genome
-by- environment” interactions (G×E).
Area under the precision-recall
curve of the SS approach far surpassed that of the CB (1st quartile, median, 3rd
quartile: SS = 0.94, 0.96, 0.99; CB= 0.50, 0.52, 0.65). We conclude that single-subject pathway detection methods
are uniquely suited for consistently detecting pathway dysregulation by the
inclusion of a patient’s individual variability.
http://www.lussiergroup.org/publications/PathwayMarker/
Keywords: pathway, gene set, biomarkers, single-subject, cohort,
precision medicine, kMEn, n-of-1