Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - PhenoMeNal (PhenoMeNal: A comprehensive and standardised e-infrastructure for analysing medical metabolic phenotype data)

Teaser

Metabolic phenotypes are influenced by intrinsic and environmental factors that determine health status and disease risks of an individual or group. Measuring and modelling of the metabolites in an individual provides insights into disease factors and etiology that can used...

Summary

Metabolic phenotypes are influenced by intrinsic and environmental factors that determine health status and disease risks of an individual or group. Measuring and modelling of the metabolites in an individual provides insights into disease factors and etiology that can used for personalised medicine. The analysis, however, is extremely demanding and subject to statistical and computational challenges. The PhenoMeNal (Phenome and Metabolome aNalysis) project addresses these challenges by providing a comprehensive and standardised e-infrastructure that supports data processing and analysis pipelines for the massive amounts of medical molecular phenotype data generated by metabolomics applications. As such, the PhenoMeNal infrastructure provides services to the European Biomedical Community enabling computation and analysis to improve the overall understanding of the causes and mechanisms underlying health, healthy ageing and disease.
The PhenoMeNal infrastructure can also be easily reused in other domain fields, requiring only the implementation of container images and wrappers for the tools of the desired domain (proteomics, genomics, astronomy, etc), according to our guidelines. It covers all layers of the data analysis workflow, from the earliest point of data acquisition to the generation of scientific knowledge. With standardised and well-tested workflows, accessible through the PhenoMeNal Virtual Research Environment (VRE) portal, PhenoMeNal aims to enable frictionless data access to scientists with appropriate credentials by providing Findable, Accessible, Interoperable and Reusable (FAIR) datasets.
Patient and research subject data are very sensitive, and it is paramount importance to establish a robust governance framework for overall information management including sensitive data. The PhenoMeNal e-infrastructure also ensures that all data collected and held within the project complies with local laws, regulations and ethics.
The overall objectives of the project are:
1. To use existing open source community standards, integrate tools, resources and methods for the management, dissemination and computational analysis of very large datasets of human metabolic phenotyping and genomic data into a secure and sustainable e-Infrastructure
2. To operate and consolidate the PhenoMeNal e-infrastructure based on existing internal and external HPC (high-performance computing), cloud, and grid resources, including the EGI and the EGI Federated Cloud, and to extend it to world-wide computational infrastructures;
3. To improve and scale-up tools used within the infrastructure to cope with very large datasets;
4. To establish technology for a water-tight audit trail for the processing of human metabolic phenotyping data from the raw data acquisition all the way to the generation of high-level biomedical insights (such as a medical diagnosis);
5. To establish privacy-protection methods that allow working with highly sensitive molecular phenotype data;
6. To foster the worldwide adoption of PhenoMeNal through a wide range of outreach, dissemination, networking and training activities;
7. To develop a model to ensure sustainability of the PhenoMeNal network.

Work performed

1. Establishment of VRE portal based on our iterative UX design process
2. 1st PhenoMeNal release (codename: Alanine) with two example use cases demonstrating the two initial services Galaxy (standardised workflows) and Jupyter (exploratory data analysis).
3. Industry-grade orchestration and deployment
4. VRE Deployment on Amazon, Google Cloud as well as public and private OpenStack clusters
5. PhenoMeNal Service catalogue “App Library” with 36 metabolomics tools
6. Various data management software tools developed in the ISA API (http://www.github.com/phnmnl/isa-api), mzml2isa and nmrml2isa tools, all of which are containerised for use in cloud infrastructure via docker and Galaxy
7. Established Galaxy public instance
8. Comprehensive documentation and tutorials for both developers and end users available via website and portal
9. Help desk for users
10. First working versions of various analysis pipelines: Sacurine data analysis use-case, Fluxomics use-case and OpenMS Uppsala use-case
11. Cloud-ready VMI’s for the Galaxy and Jupyter web applications.
12. Established Process for checking ELSI compliance for datasets
13. Extensive outreach in workshops, hackathons, video tutorials
14. Project progress in line with KPI’s
15. A local installation of the Phenomenal compute environment at Imperial College London for processing sensitive data

Final results

1. Provision of user friendly methods and tools for data analysis that can give novel biological and clinical insights, providing impact in the clinical and public health arena.
2. Usage of our cloud infrastructure in other domains only require implementation of simple wrappers for the respective tools..
3. Contribution to standardisation of Metabolomics computing in Europe and beyond.
4. Launched Metabolomics FAIR data node in GoFAIR
5. Model system for ELIXIR cloud strategy – ELIXIR provides an excellent platform to enable interaction with other OMICS areas.
6. On path to be the official Metabolomics representative for EOSC and as a use case in ELIXIR

Website & more info

More info: http://phenomenal-h2020.eu/home/.