Audiovisual content created and used in the fields of broadcasting and media, i.e. films and videos, is key for people to memorize, communicate and entertain. In the future, it will also be an essential resource of history, since large deal of memories and records of the 20th...
Audiovisual content created and used in the fields of broadcasting and media, i.e. films and videos, is key for people to memorize, communicate and entertain. In the future, it will also be an essential resource of history, since large deal of memories and records of the 20th and 21st centuries are audiovisual. Efficient methods are thus needed to cope with the ever-growing audiovisual big data. The efficient use of audiovisual content in the broadcasting context requires time-aligned video description, i.e. verbal descriptions of the visual and auditory components of the content, which are generated with methods that go beyond the performance of state-of-the-art methods: for example, the current automatic video description methods do not make efficient use of the audio input nor recognize action sequences in a way comparable to the human description. This project will develop novel methods of analysing and describing video content based on a combination of computer vision techniques, human input and machine learning approaches. enhanced automatic descriptions. These descriptions will allow Creative Industries as well as people using their services to access, use and find audiovisual information in novel ways with better metadata. They will be able to locate particular segments in films rapidly and accurately on the basis of searching and browsing in text corpora which have been compiled from audiovisual data aligned with verbal descriptions. Moreover, the intermodal translation from images and sounds into words will attract new users, such as the the deaf, hard-of-hearing, blind, and partially-sighted audiences who would else be excluded from the visual or auditory content.
The MeMAD consortium aims to develop automatic language-based methods for managing, accessing and publishing pre-existing and originally produced Digital Content in an efficient and accurate manner within the Creative Industries, especially in TV broadcasting and in on-demand media services. To achieve these aims, the MeMAD consortium has specified the following four objectives.
â— Objective O1: Develop novel methods and tools for digital storytelling
â— Objective O2: Deliver methods and tools to expand the size of media audiences
â— Objective O3: Develop an improved scientific understanding of multimodal and multilingual media content analysis, linking and consumption
â— Objective O4: Deliver object models and formal languages, distribution protocols and display tools for enriched audiovisual data
During the first 18 months the project has progressed as expected according to the work plan without any major deviations. The teams have been formed (with a number of recruitments) and the work plans have been discussed and concretized. YLE and INA have collected a substantial amount of broadcast video data and prepared it for MeMAD’s use. MeMAD’s data archive has been created and intermediate data formats have been defined for using the data in the work packages. An initial version of the MeMAD prototype has been created on top of Limecraft’s Flow system, where the new technology components can be operated and the results from one component and user environment can be seamlessly passed to another. We have reviewed, studied and pushed further the state-of-the-art in the generation of description of audiovisual data by automatic visual analysis, speech recognition, audio event detection, speaker diarization and named entity recognition. In addition to developing the systems in several languages, we have also contributed to the state-of-the-art in multimodal machine translation, where the output description of multimodal events can be provided in another language than the input data. Furthermore, we have studied human annotation of video data and audio description and started to create a human annotated video database for comparative analysis of human and machine descriptions. Finally, we have re-used existing semantic metadata standards and applied Linked Data best practices to publish a so-called MeMAD knowledge graph that provide semantic descriptions of all broadcast video data. We also have enabled and facilitated joint work between media industry and researchers, increasing mutual understanding of typical professional workflows, priorities and user needs in both domains. This lays way for deeper future collaboration and ensures the relevance of the project innovations and research work.
The project has produced a number of publications and deliverables. We have also shared the software and results created in the project in https://github.com/memad-project . We have published a semantic data platform at http://data.memad.eu/ . The research results so far are world class as proven by our excellent success in various recent benchmarking challenges. In addition to publishing scientific articles, sharing the software and the results in public takes the research field forward. By disseminating the results directly to various European broadcasters and their service suppliers we try to maximize our impact in the domain of production and distribution of audiovisual content.
In Digital Storytelling, which relates to MeMAD objective O1, the state of the art is that no existing scripts are processed in post-production and not supported by the distribution protocol. As a result, subtitles, audio description and any other form of complementary content is retrofitted, which is a slow and expensive process. MeMAD develops an electronic script, which is curated throughout the production and post-production processes to turn the script into subtitles, audio description and clickable captions that can be positioned on the screen. For non-scripted content MeMAD will make up a script as the available material is being indexed and as material fragments are selected either manually or automatically as part of the story. As its objective O2, MeMAD aims at building the basis for an automatically or semi-automatically functioning model of video content description, which can be applied to different contexts of use. Our project takes a great step forward from the state-of-the-art in automatic content description which is currently provided for static visuals such as photographs. Providing video description - whether from audio or visuals to text - creates new audiences by creating a verbal surrogate that anyone, not only the disabled, can benefit from. Another advancement to the current state of the art is to provide automatic translation of video content descriptions between multiple languages. In objective O3 the automatic and manual methods and technologies for verbal description are combined for a more accurate automatic description method. The automatic computer-driven analysis techniques detect visual and auditory elements from multimedia and labels them with pre-defined indexing concepts, generates textual description of the content and provides speech recognition of the spoken utterances. The ambition to bring a formal language describing Digital Content to the market corresponds to the objective O4. The key to the innovation is to provide Creative Industries with a common representation for the master data during the production processes, so that the current document-oriented editorial processes can be substituted with a more structured approach.
More info: https://memad.eu.