Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - MuG (Multi-Scale Complex Genomics)

Teaser

Every cell in our body contains two metres of DNA that hold our genetic code. Although the DNA is the same in every cell, different genes are active and inactive in different types of cells (e.g. heart cells versus brain cells), giving them their specific characteristics...

Summary

Every cell in our body contains two metres of DNA that hold our genetic code. Although the DNA is the same in every cell, different genes are active and inactive in different types of cells (e.g. heart cells versus brain cells), giving them their specific characteristics. Recently, scientists have discovered that one of the ways that cells control which genes are switched on, and which are not, is by rearranging the way the DNA is folded up inside the nucleus. However we still understand very little about how and why cells do this, and how it may contribute to certain diseases such as cancer, or to the way our cells change as we get older.

The study of how cells package their DNA is called ‘3D/4D genomics’: 3D because it is about the three dimensional shape of the DNA, 4D because we have to add the dimension of time - the way the DNA is folded in a cell can change from day to day. One of the big problems in this area is that it is a ‘multiscale’ type of science. To understand it properly yone needs to look both at the behaviour of whole cells, and of individual molecules in the cell (which are a million times smaller).
This is a young science, expanding rapidly. All the time new groups discover new ways to study it at the large, medium, or small scale. Enormous amounts of very complicated data are being generated and now we urgently need a good way to help scientists bring this all together and allow them to make sense of it by seeing how it fits together into the ‘big picture’. Computer simulations are a powerful tool to help us do this. They allow us to turn complicated experimental data into pictures of how DNA is packed and folded, from the molecular scale all the way up to the cellular scale. But until now the ways in which these visualisations are done has not been standardized, or adapted to all the new sorts of data that are becoming available, or made simple for non-specialists to use.
The MuG Virtual Research Environment is now a reality available to the scientific community: a sort of specialised web browser - where scientists can:
*Upload, share, find and check all types of 3D/4D genomics data generated by experimentalists anywhere in the world
*Perform data analysis and integration tasks, some of which need a lot of computer power
*Perform computer simulations that turn this data into visualisations of how the DNA is packed into a cell, and how this can change
*See how all this relates to how cells change their behaviour, and so affects growth, development, disease and ageing

This has only been possible through a tight collaboration of a unique multidisciplinary team of experts in experimental 3D/4D genomics, in molecular studies of DNA, and in computer and data science.

Work performed

The MuG VRE was released to the public in November 2017. The key feature of the platform is a central workspace that allows the user to access 3D/4D genomics data, analysis and visualisation tools and to perform analysis through different levels of resolution. The VRE web portal backend (vre.multiscalegenomics.eu) provides access to the virtualized platform, channeling the analysis or simulation operations to the appropriate infrastructure, managing the execution and returning the results to the workspace. During 2018 stability and performance, user experience and documentation have been improved (WP5) and the tool offer has been revised (WP3, WP6) to ensure it is up to date with the community needs. WP4 has ensuredthe storage and processing of many different types of genomic data in such a way that a user can get the information they need, when they need it, without ever having to think about where it is stored. Data analysis workflows have been developed (WP4, 5, 6) that cross the wide range of experiments and data types generated by both the consortium and the wider 3D genomics community.

The MuG website currently has an average traffic of 170 new users/month and the MuG VRE has over 150 registered users actively using VRE tools. A functional version of the multi-resolution genome-browser TADkit is installed and running since 2017 (WP3). In 2018 efforts focused on enhancing the browser to fulfill the requirements of the MuG pilot projects, as lead users representing the needs of the 3D/4D genomics community.

To facilitate the sustainability of the MuG VRE and its capacity to keep up to date with the community demands, MuG has developed a tool wrapping API that facilitates the integration of tools by third party tool developers. The pilot projects (WP7) have contributed to define the VRE tool offer and have successfully tested the tools integrated in the VRE. As leaders in the field, pilot projects have had a key role in end-user engagement, providing real use cases for MuG training activities and acting as VRE lead users VRE. Datasets generated by the pilot projects will be made available for re-use to the community following publication.

MuG is already generating impact on the scientific community: 31 published papers including a position paper in Nature Genetics on the 3D/4D data and processing standards, co-authored with worldwide leading research groups. The MuG team was actively involved in the organization of scientific gatherings. Training has also been a key tool for MuG to engage with its end-users and is identified as a key service in sustainability.

Final results

MuG tackles the needs of the emerging 3D/4D genomics community by putting into place community-tailored computational tools and infrastructure that allow the analysis and interpretation of the genome from sequence annotation to 3D folding, from atomic level resolution to entire chromatin representation.

MuG brings advanced and powerful computing closer to this new community, becoming the natural interface between experimental biologists doing research with chromatin (DNA in a cell), physicists developing methods to simulate it, and computer scientists aiming to improve analysis and simulation tools, and how data is stored, integrated and shared. MuG is positioning itself as a reference to structure the community, define standards in software and data and provide a sustainable, computationally powerful infrastructure that will reduce the gap between experimental scientists and the high performance computing (HPC) world.

Through MuG, biologists, methods developers and computational scientists join forces to find solutions for a field that is expected to make a huge impact on the bio-world, from basic cell biology to personalized medicine. As a EU-funded infrastructure, the main objectives set for the MuG VRE in the mid-term are to speed up research in the emerging field of 3D/4D genomics, contributing to making Europe a preferred place for scientists to conduct research and innovation. To this effect, any benefits are set to be reinvested in further development.

Genomics is also an attractive field for industry. MuG can contribute to processing output data from high-throughput sequencing equipment, thus being of interest for sequencing instrument vendors. The pharma industry is another potential long-term beneficiary of the public information made available through MuG, which may contain clues on the use of DNA-interacting proteins as potential drug targets. According to a recent report by the European Federation of Pharmaceutical Industries and Associations (EFPIA), the research-based pharma industry invested 31.5k€ in R&D in Europe alone in 2015 and employed 725,000 people directly.

Website & more info

More info: http://www.multiscalegenomics.eu.