Summary
Here, we developed a web serve called Tumor online Prognostic analyses Platform (ToPP), which collected multi-omics (mutation, CAN, gene fusion, mRNA, miRNA, lncRNA, protein and methylation) and clinical data from 56 types of tumors datasets from The Cancer Genome Atlas (TCGA) as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC)project.
ToPP provide a user-friendly web interface to reduce user barriers, as well as the customizable high resolution charts. In addition, except to conventional univariate or multivariate analysis, it also provides subgroup survival analysis, prognostic modeling, pan-caner survival analysis, and allows users to upload their own data for prognostic analysis.
ToPP aims to provide more convenient and reliable prognostic analysis services for tumor researchers no login requirement.
Browser compatibility
OS | Version | Chrome | Firefox | Microsoft Edge | Safari |
Linux | CentOS 7 | 87.0.4280.88 | 60.3 | n/a | n/a |
MacOS | Mojave 10.14.6 | 87.0.4280.67 | 72.0.2 | n/a | 14.0.1 |
Windows | 10 | 86.0.4240.198 | 82.0.3 | 44.18362.449.0 | n/a |
Contact us
If any question about the ToPP, please contact us: ouyangjian12@163.com
Update news:
- Version 1.2 : Time 2020-9-30,Collected multi-omics data and clinical data of 41 tumors in ICGC.
- Version 1.1 : Time 2020-5-1,Collected the MS proteomics data in Breast invasive carcinoma (BRCA), Colon adenocarcinoma (COAD), Ovarian serous cystadenocarcinoma (OV) and Rectum adenocarcinoma (READ) from CPTAC.
- Version 1.0 : Time 2019-12-31,construct the basic analysis tools of the webserver and add the multi-omics data and clinical data of 33 tumors in TCGA.
Univariate analysis
Univariate analysis was performed by log-rank test.Kaplan-Meier(KM) survival curves [1] will draw by the divided groups, the hazard ratio and the 95% confidence interval information will also be included in the survival plot.
Since the researchers may also be interested in the differential expression between tumor tissue and the paracancerous tissue, ToPP also provides boxplot for the two groups and wilcoxon test was performed test if there is significant difference between the two groups.
Parameters
Data selection
- Select Dataset: Select the dataset, there are a total of 33 datasets here. See the Supplemental table for more information.
- Gene: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
- Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progression-free interval, DSS:disease-specific survival,DFI:disease-free interval and RFS:Relapse Free Survival.
- Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘best-cut’ value. Here, we calculated all the log-rank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘best-cut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
- Conditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation.
- cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or disease-specific clinical features such as HER2+ in breast cancer.
- Differential expression: Whether to perform differential expression analysis between cancer and adjacent cancer.
Draw designs
- Risk.table: Draw risktable under the K-M curve.
- Unit Time: Choose a unit of time for survival such as years,months and days.
- Median.line: Plot the survival rate for the median survival time
- 95% Conf.int: Show the 95% confidence interval of the K-M curve.
- Hazards Ratio: Show the Hazards Ratio between the two groups.
- Group color: Select the different color for the two groups.
Result
Typical univariate analysis, for subgroup analysis please choose the Conditions or cliConditions selector.
KM curves: includeing p-values for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Boxplot :The wilcox test was used to detect whether the two groups were significantly different in terms of expression, *** for p<0.001,** for p<0.01,* for p<0.05,NA for no significant difference between the two groups Difference. All resulting images are downloadable including .pdf, .png, .tiff formats with 300 DPI.
Multivariate analyses
Multivariate analysis was performed by cox regression analyses (or Cox proportional hazards model) [2].When ToPP performs multivariate analysis, it will not only give the p value of HR and logrank test for each gene in multivariate analysis, but also calculate the p value of HR and logrank test for each gene as a separate prognostic factor. So that, user can determine whether this gene is an independent prognostic factor.
Parameters
- Genes: Input a gene list with Gene symbol only.
- Select Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.
- Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progression-free interval, DSS:disease-specific survival,DFI:disease-free interval and RFS:Relapse Free Survival.
- Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘best-cut’ value. Here, we calculated all the log-rank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘best-cut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
- Conditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation.
- cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or disease-specific clinical features such as HER2+ in breast cancer.
- Differential expression: Whether to perform differential expression analysis between cancer and adjacent cancer.
- Draw designs: View a description of the parameters in the Univariate analysis.
Result
Typical multivariate analyses, for subgroup analysis please choose the Conditions or cliConditions selector.
KM curves: includeing p-values for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc also,it will demonstrate the formula for calculating risk score.All resulting images are downloadable including .pdf, .png, .tiff formats with 300 DPI.
Result table:contains HR values for each gene in univariate analysis and multifactor analyses, p-values of logrank test and coefficients in multifactor analysis
Prognostic model
When in constructing a prognostic model, firstly, we fit a naive Cox model include all covariates. While in the real world data we should removing redundant or irrelevant variables from the model which describes the data by reducing variance on the expense of bias to make model more robust. Interaction between covariates and time-dependent covariates should also be considered in the modeling process [3]. Here we select stepwise selection for variables selection, stepwise selection is a mix between forward and backward selection. We can either start with an empty model or a full model and add/remove predictors according some criteria. We will use the Akaike information criterion(AIC), which is defined as follows: AIC = 2k − 2max(log-likelihood) where k is the number of parameters in the model. Than we made Model diagnostics which mainly includes the following three aspects: 1) testing the proportional hazards(PH) assumption, 2) examining influential observations (or outliers), 3) detecting nonlinearity in relationship between the log hazard and the covariates. In order to check these model assumptions, Residuals method are used. The common residuals for the Cox model include:1) Schoenfeld residuals to check the proportional hazards assumption, 2) Martingale residual to assess nonlinearity, 3) Deviance residual (symmetric transformation of the Martinguale residuals), to examine influential observations [4]. Covariates conversion such as nonlinear transformations can be done if it is necessary. Finally, we use Concordance index(C-index) [5] to evaluate the effect of the model.
Parameters
- Basic parameters: Same as described in Multivariate Analyses.
- Direction: The mode of stepwise search, can be one of "none", "both", "backward", or "forward", with a default of "none".
- coxdiagnostics type: The type of residuals to present on Y axis of a diagnostic plot. The same as in residuals.coxph: character string indicating the type of residual desired. Possible values are "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas" and "scaledsch".
- Validation Dataset: Selecting a Validation Set for Model Testing.
- formula: The variables contained in the model and their corresponding conversion formulas.
- variable transformation: Non-linear transformation of variables if necessary.
Result
The three and five year calibration curve for prognostic model, the closer the curve is to the diagonal, the more accurate the model
Nomogram for prognostic model,including the weight of each feature and three-year and five-year survival rates
PH assumption test(right),and displays diagnostics graphs presenting goodness of Cox Proportional Hazards Model fit(left)
Displays graphs of continuous explanatory variable against martingale residuals of null cox proportional hazards model, for each term in of the right side of formula. This might help to properly choose the functional form of continuous variable in cox model (coxph). Fitted lines with lowess function should be linear to satisfy cox proportional hazards model assumptions.
Pan-cancer
Pan-cancer Analysis was designed to investigate the prognostic effects of factors in a variety of tumors. All data were from the Pan-Cancer Atlas [6] with unified normalization and standardization. Users can select all type of cancer or just choose a subset of pan-cancer such as urologic (bladder urothelial carcinoma [BLCA], prostate adenocarcinoma [PRAD], testicular germ cell tumors [TGCT], kidney renal clear cell carcinoma [KIRC], kidney chromophobe [KICH], and kidney renal papillary cell carcinoma [KIRP]) which was designed by Pan-Cancer Atlas project. After that users can filtering subsets by conductions described in univariate module, besides ToPP set a somatic alterations (copy-number alterations, mutations, fusions or epigenetic silencing) in ten canonical pathways: cell cycle, Hippo, Myc, Notch, Nrf2, PI-3-Kinase/Akt, RTK-RAS, TGFb signaling, p53 and b-catenin/Wnt[7] as the pathway condition screening.
Parameters
- Select Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.Users can select multiple types of datasets.
- Gene: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
- Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progression-free interval, DSS:disease-specific survival,DFI:disease-free interval and RFS:Relapse Free Survival.
- Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘best-cut’ value. Here, we calculated all the log-rank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘best-cut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
- geneConditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation and 10 canonical signaling pathway.
- cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or disease-specific clinical features such as HER2+ in breast cancer.
- Draw designs: View a description of the parameters in the Univariate analysis.
Result
Typical pan-cancer analysis, for subgroup analysis please choose the Conditions or cliConditions selector.
K-M curves: includeing p-values for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Forest plot:Meta-analysis of multiple tumors and pan-cancer,includeing lograng p value and HR(95%CI)
Combination Analysis
Combination analysis is to analyze the effect of the synergy of two factors on the prognosis. These two factors can be the same level of data such as the expression of two genes, or they can be different levels of data. For example, one is to check the effect of gene expression on prognosis, the other is to check the effect of gene mutation on prognosis. Then all the patients will be divided into 4 groups (high expression + mutation group, high expression + wild type group, low expression + mutation group and low expression + wild type) according to the threshold of two factors and log-rank test was performed to test whether there was significant difference between the two groups, separately. In this way, researchers can assess the synthetic lethality effects of two genes.
Parameters
Data selection
- Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.
- Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progression-free interval, DSS:disease-specific survival,DFI:disease-free interval and RFS:Relapse Free Survival.
- Gene1: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
- Data tpye1: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- Gene2: Input another Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
- Data tpye2: Select another certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘best-cut’ value. Here, we calculated all the log-rank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘best-cut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
- Draw designs: View a description of the parameters in the Univariate analysis.
Result
Combination Analysis for multiple genes with multiple omics.
K-M curves: includeing p-values for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Upload Your Data
Data upload module allow users to upload their own data with survival time and status to do all kind of analysis in ToPP. Also user should set the permission for their own data and keep an email for connect if necessary.
Parameters
-
upload seq data:
Data format requirements: Example data can be downloaded here
Accept only data in TXT format and columns separate by \t;
Column names should be the survival state,survival time, gene symbol or clinical feature, rownames should be sample ID;
Column names of Survival state should be be in the "OS,PFI,DSS,DFI,RFS",Corresponding column names of survival time should be in the 'OS.time,PFI.time,DSS.time,DFI.time,RFS.time'.
The Survival state should be in 1 or 0 ,1:event occurrence,such as Death, recurrence, metastasis,censoring, etc. 0: event not occurrence such alive, no metastasis, etc;
The survival time should be in days.
Column names should contain "sample","group",while in group the tumor sample must be the same as file name,such as if your file is LIHC.txt,than the tumor patient in group column must be 'LIHC'
sample group cancer.type gender OS OS.time DSS DSS.time TP53 EGFR ERBB2 EZH2 TCGA.2V.A95S.01 TCGA-LIHC LIHC MALE 0 NA 0 NA 0 1.2 9.08 6.2 TCGA.2Y.A9GS.01 TCGA-LIHC LIHC MALE 1 724 1 724 0 2.57 8.03 2.5 TCGA.2Y.A9GT.01 TCGA-LIHC LIHC MALE 1 1624 1 1624 0 0 8.57 1.53 TCGA.2Y.A9GU.01 TCGA-LIHC LIHC FEMALE 0 1939 0 1939 2.77 1.42 8.82 0.56 TCGA.2Y.A9GV.01 TCGA-LIHC LIHC FEMALE 1 2532 1 2532 5.99 0.88 7.6 9.31 TCGA.2Y.A9GW.01 TCGA-LIHC LIHC MALE 1 1271 1 1271 0 0.82 8.33 8.95 TCGA.2Y.A9GY.01 TCGA-LIHC LIHC FEMALE 1 757 1 757 0 0 8.4 8.59 TCGA.2Y.A9GZ.01 TCGA-LIHC LIHC FEMALE 1 848 1 848 7.41 6.1 8.74 9.22 TCGA.2Y.A9H0.11 normal LIHC MALE 0 3675 0 3675 0 0 7.65 8.86 - Select data type: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
- cancer type: Select cancer type for your study .
- Permission: Private means that the data can only be accessed through the IP address of the computer currently submitted. Public means that everyone can access the data. Of course, if someone uses the data you have public, we hope that he can join it to the reference.
- Reference: Input the reference documents corresponding to the dataset for reference.
- Study design: Give a brief description of the study, especially information about sample collection and study design.
- Email: Input email address which we can contact with your.
Result
Supplemental table
Supplemental table for all the cancer type and the data type.
ID | Genome | Transcriptome | Proteome | Epigenome | Clinical data | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | Project | 中文名称 | Mutation | CNA | Fusion | mRNA | miRNA | LncRNA | RPPA | MS | Methylation | Phenotype |
TCGA-ACC | Adrenocortical carcinoma | 肾上腺皮质癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-BLCA | Bladder Urothelial Carcinoma | 膀胱尿路上皮癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-BRCA | Breast invasive carcinoma | 乳腺浸润癌 | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ |
TCGA-CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma | 宫颈鳞癌和腺癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-CHOL | Cholangiocarcinoma | 胆管癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-COAD | Colon adenocarcinoma | 结肠癌 | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ |
TCGA-DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 弥漫性大B细胞淋巴瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-ESCA | Esophageal carcinoma | 食管癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-GBM | Glioblastoma multiforme | 多形成性胶质细胞瘤 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-HNSC | Head and Neck squamous cell carcinoma | 头颈鳞状细胞癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-KICH | Kidney Chromophobe | 肾嫌色细胞癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-KIRC | Kidney renal clear cell carcinoma | 肾透明细胞癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-KIRP | Kidney renal papillary cell carcinoma | 肾乳头状细胞癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-LAML | Acute Myeloid Leukemia | 急性髓细胞样白血病 | √ | √ | √ | √ | √ | NA | NA | NA | √ | √ |
TCGA-LGG | Brain Lower Grade Glioma | 脑低级别胶质瘤 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-LIHC | Liver hepatocellular carcinoma | 肝细胞肝癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-LUAD | Lung adenocarcinoma | 肺腺癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-LUSC | Lung squamous cell carcinoma | 肺鳞癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-MESO | Mesothelioma | 间皮瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-OV | Ovarian serous cystadenocarcinoma | 卵巢浆液性囊腺癌 | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ |
TCGA-PAAD | Pancreatic adenocarcinoma | 胰腺癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-PCPG | Pheochromocytoma and Paraganglioma | 嗜铬细胞瘤和副神经节瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-PRAD | Prostate adenocarcinoma | 前列腺癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-READ | Rectum adenocarcinoma | 直肠腺癌 | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ |
TCGA-SARC | Sarcoma | 肉瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-SKCM | Skin Cutaneous Melanoma | 皮肤黑色素瘤 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-STAD | Stomach adenocarcinoma | 胃癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-TGCT | Testicular Germ Cell Tumors | 睾丸癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-THCA | Thyroid carcinoma | 甲状腺癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-THYM | Thymoma | 胸腺癌 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-UCEC | Uterine Corpus Endometrial Carcinoma | 子宫内膜癌 | √ | √ | √ | √ | √ | √ | √ | NA | √ | √ |
TCGA-UCS | Uterine Carcinosarcoma | 子宫肉瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
TCGA-UVM | Uveal Melanoma | 葡萄膜黑色素瘤 | √ | √ | √ | √ | √ | NA | √ | NA | √ | √ |
Reference
[1]. Kleinbaum D G, Klein M. Kaplan-Meier survival curves and the log-rank test[M]//Survival analysis. Springer, New York, NY, 2012: 55-96.
[2]. Christensen E. Multivariate survival analysis using Cox's regression model[J]. Hepatology, 1987, 7(6): 1346-1358.
[3]. The Cox model in R Gardar Sveinbjornsson, Jongkil Kim, Yongsheng Wang April 18, 2011
[4]. Xue Y, Schifano E D. Diagnostics for the Cox model[J]. Communications for Statistical Applications and Methods, 2017, 24(6): 583-604.
[5]. Harrell Jr F E, Lee K L, Mark D B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors[J]. Statistics in medicine, 1996, 15(4): 361-387.
[6]. Hoadley K A, Yau C, Hinoue T, et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer[J]. Cell, 2018, 173(2): 291-304. e6.
[7]. Sanchez-Vega F, Mina M, Armenia J, et al. Oncogenic signaling pathways in the cancer genome atlas[J]. Cell, 2018, 173(2): 321-337. e10.