Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - SIFRm (Semantic Indexing of French Biomedical Data Resources - mobility)

Teaser

The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific...

Summary

The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data. A key aspect to address those issues is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable. Researchers have called for the need of automated annotation methods and for leveraging natural language processing tools in the curation process. Still, even if the issue is being currently addressed for English, French is not in the same situation: there is little readily available technology (i.e.,“off-the-shelf” technology) that allows the use of ontologies uniformly in various annotation and curation pipelines with minimal effort.

The Semantic Indexing of French Biomedical Data Resources (SIFR/SIFRm — www.lirmm.fr/sifr) project investigates the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of biomedical data. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.

Within SIFR, we build an ontology-based indexing workflow (i.e., SIFR Annotator) similar to what exists for English resources but specialized for other EU languages, starting with French. This service is available within a portal of ~30 French biomedical ontologies/terminologies which reuses the NCBO BioPortal technology, developed at Stanford University. The SIFR BioPortal has been released in June 2015 (http://bioportal.lirmm.fr) and actively used and improved since then. Recently, the SIFR Annotator has been enriched to process clinical data and contextualize the annotations (negation, temporality, experiencer). We offer now, both for English and French a unique open ontology-based annotation service that both recognize ontology concepts and contextualize them allowing non-natural-language-processing experts to both annotate and contextualize medical conditions in clinical notes.

In addition, we are also abstracting and generalizing our results to agronomy by offering an ontology repository for agronomical ontologies called AgroPortal. The AgroPortal project, is a community effort started by the Montpellier scientific community (LIRMM, IRD, CIRAD, INRA, Bioversity International) to build an ontology repository for agronomy and related domains (food, plant sciences and biodiversity). Our goal is to encourage the adoption of metadata and semantics to facilitate open science. By enabling straightforward use of ontologies, we expect data managers and researchers to focus on their tasks, without requiring them to deal with the complex engineering work needed for ontology management.

SIFR/SIFRm (2013-2019) is a collaborative action between LIRMM & BMIR previously funded by the French ANR Young Researcher program and currently by the EU H2020 Marie Sklodowska-Curie Program (2016-2019). Dr. Clement Jonquet, SIFR’s principal investigator, is assistant professor at University of Montpellier & LIRMM, and previously visiting scholar at Stanford BMIR, within Pr. Mark Musen’s team.

Work performed

• We develop a French biomedical ontology portal prototype including the SIFR/French Annotator (http://bioportal.lirmm.fr/annotator). A service that for a given piece of text will return biomedical ontology concepts directly mentioned in the text or semantically expanded.

• We deployed, customized and maintain an ontology repository for French biomedical ontologies/terminologies, the SIFR BioPortal (http://bioportal.lirmm.fr) that hosts 30 terminologies and ontologies and offer multiple ontology-related services to the community.

• We developed a proxy web service for the NCBO Annotator (http://bioportal.lirmm.fr/ncbo_annotatorplus) that gives access —for English data— to new features that have been investigated and implemented within SIFR. This is now include also inside the original NCBO BioPortal.

• We worked on automatic detection of emotion on public heath forums using text mining techniques and built a patient vocabulary out of public patient-written resources (http://bioportal.lirmm.fr/ontologies/MUEVO).

• We develop, enhance and maintain AgroPortal platform prototype (http://agroportal.lirmm.fr) which goals is to offer a reference ontology repository for the agronomic/plant domain. This is major outcomes of SIFRm, which has become an independent project now.

• 24 open access scientific publications or communications (with explicit acknowledgement of SIFRm) including: 8 international articles in journal such as Bioinformatics (Oxford), Web Semantics (Springer), Data Semantics (Springer), Biomedical Semantics (BMC); 1 dissemination journal; 7 international conferences or workshops; 3 national conferences or workshop.

Final results

SIFR/SIFRm enabled the emergence of new research domains and applications at LIRMM and materialized an important international collaboration with Stanford BMIR. SIFR offered the French biomedical community (e.g., clinicians, health professionals, researchers) highly valuable ontology-based indexing services that will enhance their data production and consumption workflows.

In collaboration with the ANR PractiKPharma project, we are investigating the challenges of processing clinical text data and semantically annotate Electronic Health Records of the G. Pompidou Hospital to extract pharmacogenomics data. In addition, the results of the project are not limited to French (also include English, Spanish) and we are also transferring our results in the agronomic domain in the context of the AgroPortal project (http://agroportal.lirmm.fr).

AgroPortal is a core component of a new ANR funded project started mid-2019 called D2KAB (www.d2kab.org). This project gathers 10 French partners for 4 years and aims to create a framework to turn agronomy and biodiversity data into knowledge – semantically described, interoperable, actionable, open – and investigate the scientific methods and tools to exploit this knowledge for applications in agriculture and biodiversity sciences.

Most of SIFRm’s journal publications are gold open access and the project developments are all open source: https://github.com/sifrproject and https://github.com/agroportal

Website & more info

More info: http://www.lirmm.fr/sifr.