Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - REACTOMEgsa (Extending the REACTOME Pathway Database for multi-omics biomedical data analysis)

Teaser

\"The increasing availability of high-throughput ‘omics technologies results in unprecedented opportunities for precision medicine and biomedical research. Increasingly available approaches such as Transcriptome sequencing (RNA-seq), mass spectrometry (MS)-based shotgun...

Summary

\"The increasing availability of high-throughput ‘omics technologies results in unprecedented opportunities for precision medicine and biomedical research. Increasingly available approaches such as Transcriptome sequencing (RNA-seq), mass spectrometry (MS)-based shotgun proteomics, and microarray studies enable us to characterise genome- and proteome-wide expression changes. All of these techniques share a common challenge: to derive biologically meaningful knowledge from long lists of regulated genes or proteins.

Pathway analysis techniques have emerged as a solution to this problem. Pathway analysis techniques use existing biological knowledge for data reduction. Instead of working with a list of single proteins or genes, researchers can work on the biologically more relevant pathway level allowing a more intuitive interpretation of the data. Linking genes through pathways additionally increases the power of the statistical analysis. While single genes or proteins may only show small, non-significant changes, synchronous changes within a pathway may reveal a biologically important finding.

Reactome is a free, open access, open source, open data, curated and peer-reviewed knowledge base of biomolecular pathways. Its powerful web interface has made Reactome one of the most popular resources for pathway information with 16,450 users per month from February 2016 to February 2017 (a 22% increase from the previous year). Its stringent manual curation by PhD-level scientists with backgrounds in cell and molecular biology and constant peer-review through close cooperation with independent investigators within the community provide highly reliable pathway data for biomedical research.

In this project, I developed a novel pathway analysis system \"\"ReactomeGSA\"\" for the existing Reactome pathway resource. ReactomeGSA can perform comparative pathway analyses across species and \'omics technologies. Therefore, it is now possible to, for example, easily compare a mouse-based proteomics experiment with the data from a clinical human study. This enables researchers to quickly see whether studies led to comparable results. Additionally, it is also possible to immediately see whether the results obtained from animal or cell-line based experiments are consistent with matching human data. Previously, such comparisons required in-depth bioinformatics knowledge and were thus not available to most researchers. The new analysis system allows researchers to now perform these previously complex analysis within minutes.\"

Work performed

\"The first phase of this project focused on the question which existing pathway analysis algorithms are applicable to the different types of \'omics data. While ample data exited for microarray and transcriptomics data, evidence for the different types of quantitative proteomics data were still scarce. Therefore, I set up targeted experiments to test how different mathematical models work in the different types of quantitative proteomics data. This led to two publications that improved existing workflows for label-free and label-based quantitative proteomics data.

For label-free quantitative proteomics data, we found that spectrum clustering can greatly increase the quantitation of low-abundant proteins (Griss, Stanek et al., JPR 2019). The workflow created for this approach was integrated in ProteomeDiscoverer, one of the most widely used analysis systems for proteomics data, and is freely available at http://ms.imp.ac.at/index.php?action=spectra-cluster. Additional workflows to perform such analysis using only open-source software were made available as nextflow workflows through our newly created, dedicated github repository https://github.com/bigbio/nf-workflows.

For label-based quantitative proteomics data, we found that no existing workflow existed that contained all relevant steps for the data analysis. This forced many researchers to develop own scripts to use the output of one pipeline in a second one. We therefore created a complete workflow that is able to perform all analyses from the raw peaklist data up to the differential expression analysis of the observed proteins (Griss, Vinterhalter, and Schwämmle, JPR 2019). This workflow is shipped as a docker container to ensure the full reproducibility of the performed analysis. The complete software is again available freely and open-source at https://protprotocols.github.io. The results of these first two projects enabled me to identify pathway analysis algorithms that are suited for different types of \'omics data.

To validate the biological use of these identified pathway analysis algorithms, I performed benchmark experiments on a multi-omics dataset studying the effect of melanoma cells on B cells. This led to the surprising discovery that B cells play a crucial role in the inflammatory tumour microenvironment (Griss et al., Nat Comm 2019). We found a specific B cell subtype that is induced by melanoma cells and responsible to recruit T cells to the tumour. Moreover, this subtype predicts the response of patients to immunotherapy and enhances the activation of T cells through immunotherapy in vitro. Therefore, B cells may be a novel target to improve the efficacy of immunotherapies.

The strong clinical data retrieved from this data greatly supported validity of the chosen pathway analysis algorithms. Based on these results it became clear that the new analysis system should support multiple algorithms and be easily extensible to quickly test newly developed approaches. This led to the development of \"\"ReactomeGSA\"\": A web-based pathway analysis system that supports different pathway analysis algorithms, multiple \'omics data types as well as the simultaneous analysis of data from different species. The complete ReactomeGSA system, including the respective R package, code for the analysis system itself, and its web-based implementation, are all available at https://github.com/reactome. The R package is available through bioconductor and therefore visible to thousands of bioinformaticians focusing on the analysis of different types of \'omics data. The web-based analysis service is integrated in Reactome\'s existing web application which has more than 70,000 users per month.\"

Final results

The availability of public data is continuously increasing as it is now enforced by most funding agencies and scientific journals. This great increase comes with the great opportunity for researchers to re-use existing data to increase the power of their own research. The ReactomeGSA systems aims to simplify this exact question. ReactomeGSA is integrated in one of the most popular public pathway resources. There, it now enables any researcher to quickly compare quantitative \'omics datasets at the pathway level, across multiple \'omics platforms and species. The existing high visibility of Reactome will ensure that this new resource will immediately be visible to thousands of researchers.

Our finding that B cells play a crucial role in the inflammatory tumour microenvironment opens up a new target for immunotherapies. Thereby, this finding may have a profound impact on future cancer therapies.

Website & more info

More info: https://reactome.github.io/ReactomeGSA.