#	Pagina
attuale pagina	/open-h2020/projects/206003/results.html
-1	/open-h2020/projects/194985/results.html

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - SLIPO (Scalable Linking and Integration of Big POI data)

Teaser

Summary

Locations that exhibit a certain interest or serve a certain purpose are commonly referred to as Points of Interest (POIs). The concept of a POI is quite broad, encompassing anything from a shop, restaurant or museum to an ATM or bus stop. POI data are the cornerstone of any application, service, and product even remotely related to our physical surroundings. The creation, update, and provision of POI datasets consists a multi-billion cross-domain and cross-border industry, with a value chain natively incorporating most domains of our economy, from mobility and tourism, to logistics and manufacturing. Advances in the timely and accurate provision of POIs result into significant direct and indirect gains throughout our economy. Productivity gains, optimization of value chains, match-making consumers with goods and service providers, new value added products, are just a few examples. POI data are truly one of the foundations and value multipliers of our Digital Economy.

The value and impact of POIs is reflected in the complex, expensive and labor-intensive effort required for their production and maintenance, which inherently involves stakeholders and users throughout their value chain. Their initial production involves field-work, constant monitoring for their evolution and accuracy, integration of user-feedback mechanisms for reporting errors, quality assurance of new data, and roll-out across a plethora of services and products. In the POI market, the competitive advantages of data providers are clear and measurable: the greater the size, timeliness, richness, and accuracy of data, the better. The value chain of POI data has rapidly changed, with new data sources of even greater volume and heterogeneity, introducing opportunities for growth, but also complexity, intensifying the challenges for the quality-assured integration, enrichment, and data sharing of POIs.

POI data are by nature semantically diverse and spatiotemporally evolving, representing different entities and associations depending on their geographical, temporal, and thematic context. Due to their use in various domains and contexts, POI data is typically found in diverse, heterogeneous sources, from which bits and pieces of information need to be combined and assembled to increase value. However, this is hindered by the lack of common identifiers and data sharing formats. Even the means by which we typically identify and share POIs is inherently ambiguous. As a result, the integration of POI data remains labor-intensive and scalable only for domain-specific or small-scale efforts, leading to loss of information and thus lost value.

SLIPOâ€™s objective is to deliver the missing technologies for addressing the data integration challenges of POI data in terms of coverage, timeliness, accuracy, and richness. In SLIPO, we argue that Linked Data technologies can address the limitations, gaps and challenges of the current landscape in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain.

Work performed

SLIPO has completed its first period with the successful public release of the integrated SLIPO system, a cloud-based application for POI data integration over Big POI. The SLIPO system enables non-expert of linked data technologies to import, interlink, fuse, and enrich heterogeneous proprietary and open POI data, regardless of their original format, schema, or identifiers.

The SLIPO system integrates the leading open source Linked Data applications for geospatial data (TripleGeo, LIMES, FAGI, DEER, and SANSA). Each one is responsible for a specific step of the data integration lifecycle, and has been extended to specifically address the requirements of world-scale POI integration. Our work in each of the applications has surpassed our expectations, already delivering scalability and performance increases orders of magnitude greater since the start of the project.

The first step of the SLIPO system is to transform any type of input POI data into their RDF representation, and vice versa. This allows us to harness the power of linked data to handle POI integration, and also export the data to their original format. This task is handled by TripleGeo, the most feature-complete, fast, and extensible software of transforming data from and to RDF. We have increased the performance of TripleGeo by an order of magnitude, delivering world-scale POIs in minutes. LIMES is next, the state-of-the-art software for interlinking. LIMES handles the heterogeneity and ambiguity of POIs taking advantage of any information available in the original data to reason about which POIs represent the same physical entity. We have increased the performance of LIMES, but most importantly its accuracy for POIs. The next step is to fuse information from multiple POI sources, a task handled by FAGI. Like LIMES, FAGI applies a combination of rule-based and ML approaches to fuse features from multiple POI sources and deliver a single dataset. Having POIs represented as linked data, not only allows us to improve integration, but also leverage the available semantics to deliver new analytics. DEER, is a dedicated framework for enrichment, while SANSA is focused on delivering new value-added services on POIs. We have extended DEERâ€™s support for POI data, automating the selection dataset-tuned configuration parameters for enrichment tasks. SANSA, which combines distributed computing frameworks with the semantic technology stack, has received the Best Demo Award in ISWC 2017, in recognition of its increasing maturity for handling analysis tasks.

Final results

SLIPO aims to reduce the effort, time and cost required to produce POI data of high quality, and allow will allow non-expert POI producers and consumers to easily transform, interlink, fuse, enrich and assess the quality of big POI data. Further, our output will not only reduce current costs in the value chain but will also facilitate processes that are currently infeasible to pursue in terms of scale, quality, and usability. We have worked towards materializing our impact, through focused advances beyond the state of the art:

â€¢ We defined the SLIPO ontology, a global POI schema for representing POI data, metadata and connections between them.
â€¢ TripleGeo was extended to support practically all industrial geospatial data formats and standards, gained support for user-defined and custom mappings, hierarchical classification schemes, and increased its performance by orders of magnitude.
â€¢ LIMES increased its scalability and effectiveness for POI data by optimizing its spatial interlinking approaches, introducing new hybrid similarity functions and configurable weighting, as well as class-expression-specific specifications for tuning proximity functions on POIs.
â€¢ FAGI was enhanced with several new fusion operators and strategies for spatial and thematic properties, metrics to assess metadata similarity and quality, and performance improvements.
â€¢ DEER has been extended with POI-specific enrichment functions, pro-active enrichment strategies, and enhancements in the execution of complex non-linear enrichment pipelines.
â€¢ SANSA has been improved with core functionalities for input data support, querying and inferencing, rule mining, and clustering.