The SSIX project is a Big data and Open Data Innovation and take-up action; the goal of this action is to improve the ability of European SMEs to develop innovative multilingual data products and services, in order to turn large data volumes into semantically interoperable...
The SSIX project is a Big data and Open Data Innovation and take-up action; the goal of this action is to improve the ability of European SMEs to develop innovative multilingual data products and services, in order to turn large data volumes into semantically interoperable data assets and knowledge. SSIX aims to help meet this objective with the creation of a collection of adaptable tools which can be used to create data-driven analytics from large amounts of multilingual datasets; producing sentiment metrics which can be utilised to make better-informed business decisions. The sentiment metrics the SSIX platform produces results of the projects challenging task of extracting relevant and significant signals from the huge variety of and increasingly influential social media platforms; such as Twitter, Facebook and StockTwits. Social media data represents a collective barometer of thoughts and ideas touching every facet of society. The platform will also be capable extracting these signals from the most reliable and authoritative newswires, news feeds and blogs. One of the main advantages SSIX brings is the ability to carry out multilingual analysis; non-English language support is underrepresented in the current market offering. For the finance domain, it is anticipated that these sentiment metrics can assist with alpha generation, which has already been proven in research examining the wisdom of the crowd’s concept from social media conversations and its predictive quality on future stock market performance.
This section gives an up to date description of activities and work progress for 1st March 2015 - 29th February 2017 of the SSIX project.
WP1 is responsible for the project management, coordination and execution of the entire project; along with project administration, reporting and risk management. Additionally, WP1 is responsible for running the SSIX Data and Business Ethics Board (DBEB) and to date the DBEB has delivered two Ethical Board Report’s (D10, D11).
WP2 ran from M1 - M12 and delivered D2.1 - Business Requirements’ and Business Cases’ Definitions and Deliverable D2.2 - Business Methodology definition, which outlines the uses cases and software configuration required for the SSIX platform.
WP3 to date has produced the techniques for collecting, storing and filtering the source content used for the SSIX platforms analysis. Efforts towards these tasks are explained in detail in the Deliverables. D3.4 - Data collection and analysis. Other deliverables submitted are:
D3.5 - Data Sampling describes the procedures implemented to perform sampling of the historical and near real-time data acquired by the WP3 architecture.
D3.6 - End Point illustrates the APIs designed and developed to provide an endpoint for the data exchange between the data ingestion infrastructure of WP3 and WP4.
D3.7 - Data Streams documentation contains the technical documentation provided internally to illustrate how to interact with the Streaming APIs exposed by WP3s data ingestion infrastructure.
Additionally, WP3 has delivered three Data Management Plans (DMP).
WP4 to date submitted two deliverables related to the NLP Architecture, D4.1 - NLP Service and Analysis Architecture (Initial Version) and D4.2 - NLP Service and Analysis Architecture (Revised Version).
Two SSIX Language Resources Catalogues have been produced (D4.3, D4.4), which analyse the different reeusable and custom language resources used throughout the project. Two other NLP Service and Analysis Pipeline deliverables (D4.5, D4.6) have been completed.
WP5 - SSIX Platform Deployment, Validation and Evaluation, builds on the work from the previous work packages, during Year 1 the majority of the effort has gone towards deliverables, D5.3 - SSIX Technical Validation Plan, D5.1 - SSIX Process Specification and D5.2 - SSIX Architecture Specification.
D5.4 - 1st version SSIX Release Platform accompanies the first release of the SSIX platform was delivered in M24. It outlines the overall SSIX architecture, highlighting the different layers and explains the operation of the different work packages within the SSIX platform.
The SSIX API was completed in M24 and Deliverable D5.6 - SSIX API Definitions contains the technical documentation of the API of the SSIX Platform;
The testing principles adopted for the SSIX project are discussed in D5.3 - SSIX Technical Validation Plan, D5.7 - SSIX Platform 1st Integration and Test Cycle builds on from D5.3.
WP6 has to date delivered three editions of the Project Web site, Wiki, LinkedIn and Training Materials deliverable (D6.2, D6.3, D6.4). Two editions of the Technology Transfer and Dissemination Plan (D6.7, D6.8) have been delivered;
WP7 to date has delivered two deliverables in M24, D7.3 - Commercialization Plan and 7.4 - Exploitation Strategy and Go To Market Plan.
With respect to WP2, our findings show that while there are many tools out there trying to handle sentiment analysis extracted from open data sources/social networks, the risk of using them for the financial or other types of decision making is unknown to high. This can be summarised simply as ‘nice to watch but would one put his/her money on it?’
We have constructed a process flow that has the potential to become a reference/standard for sentiment data extraction, analysis and generation
A set of X-scores has been defined that have potential to be taken by the financial industry and reach the same level of utility as any other stock financial parameter such as P/E (price-earnings ratio), MA (moving average), MACD (moving average convergence-divergence), etc.
WP3 performance tests highlighted stability issues due to high volumes of parallel data when listening to the most discussed financial markets this bottleneck was quickly overcome with hardware resources scalability. Our experiments shifted to cloud technologies provided by the Google Cloud Platform which could reduce the time for data storage and extraction and help the scalability of the parallel computing processes. A stratified sampling technique has been chosen to extract content from large historical data sets. This technique was adopted for creating the data sample used in the production of SSIX\'s custom classifiers;
WP4 - By the first year we concluded that the available multilingual domain specific and sentiment lexica may not provide the expected features for the opinion mining needs for this project. In parallel, several Big Data analysis infrastructures were analysed for their suitability used as the foundation for the pipeline architecture. Year 2 saw development on SSIX custom sentiment classifiers, for financial microblogs and for the Brexit referendum. Benchmarking tests for the financial microblogs classifier show promising results against current SOTA services. Efforts have gone into a custom machine translation service and an aspect-based sentiment analysis classifier.
WP5. Foresight was given to the need for scalability and efficiency. The major system components have being designed to operate independently of each other, so can be distributed or centralised depending on the deployment scenario and load on the system. Areas of potential innovation include testing of new classification models, building a system for statistical calculations and NLP classification using massively parallel computing and researching new visualisations to aid end users in the decision-making process.
More info: http://ssix-project.eu/.