Ignatova V. Krasnitz A.
Knowledge-based derivation of markers in cancer
Докладчик: Ignatova V.
Motivation and Aim: Development of clinically relevant molecular markers is a central goal of cancer biology. In combination with traditional clinical metrics, these may lead to a more meaningful classification of the disease and to more accurate prognosis, and may help optimize treatment. Unsupervised learning from large sets of mRNA expression profiles with individual genes as features has traditionally been used for marker derivation. However, signals from individual genes suffer from considerable noise contamination, degrading the value of the result [1]. Since genes in a pathway function in a coordinated fashion, we expected that usage of knowledge-based (canonical) pathways enhances significance of association with some clinical parameters of cancer compared to individual genes.
Methods and Algorithms: The main results have got using Reactome database (http://www.reactome.org/). Analysis was performed on an example of ovarian cancer (namely, its main varieties — ovarian serous cystadenocarcinoma). The Cancer Genome Atlas has been used for extraction of information about gene expression profiling. We have checked Original data and three type of preprocessing: Standardization, Median polish and Standardization after Median polish. We applied two approaches — Principal component analysis and Gene set enrichment analysis. Data analysis has been performed via R [2] and BlueHelix (HPCC).
Results: First, we examined which of the pathways are manifest in the ovarian cancer data. Finally we computed nR parameter of correlation between genes within pathways. The result was compared with the distribution of nR in 500 randomly chosen sets of genes of the same size, and empirical p-value was found to be below a threshold for 169 out of 430 in Reactome database. In order to assess phenotypic and clinical relevance of pathways, we examined their association with a number of clinical parameters. At the beginning we chose first principal component as the main quantitative characteristics of pathways. The best result we got in case of “Age at diagnosis”. We computed its correlation with the first principal component for each pathway and found 60 out of 430 Reactome pathways are correlated with “Age at diagnosis” at p=0.001 level of significance. Our next approach was a Gene set enrichment analysis. The most significant p-values we have found in case of the “Platinum status”. In this case we have identified a small set of pathways showing strong correlation between genes from similar pathways in compare with random set of genes. According to significant p-value we determined two principal different groups of pathways associated with this parameter of ovarian serous cystadenocarcinoma. First group consist of translation – associated pathways. Second group is pathways takes part of lipid metabolism. Each of the two sets contains a common set of highly differentially expressed genes.
Conclusion: This approach is successful in identifying of a small set of canonical pathways correlated with clinical parameters and can be used for derivation of molecular subtypes of the disease.
References:
1. R. Verhaak et al (2011) Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353):609-15
2. R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
The authors acknowledge Saint-Petersburg State University for a research grant.
Federal Grant-in-Aid Program «Human Capital for Science and Education in Innovative Russia» (Governmental Contract No. P1067).