PERCQUALAVS

A Model for Predicting Perceived Quality of Audio-visual Speech based on Automatic Assessment of Intermodal Asynchrony

 Coordinatore TECHNISCHE UNIVERSITAT BERLIN 

 Organization address address: STRASSE DES 17 JUNI 135
city: BERLIN
postcode: 10623

contact info
Titolo: Ms.
Nome: Simone
Cognome: Ludwig
Email: send email
Telefono: 493031000000
Fax: 493031000000

 Nazionalità Coordinatore Germany [DE]
 Totale costo 155˙542 €
 EC contributo 155˙542 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2010-IEF
 Funding Scheme MC-IEF
 Anno di inizio 2011
 Periodo (anno-mese-giorno) 2011-05-01   -   2013-04-30

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    TECHNISCHE UNIVERSITAT BERLIN

 Organization address address: STRASSE DES 17 JUNI 135
city: BERLIN
postcode: 10623

contact info
Titolo: Ms.
Nome: Simone
Cognome: Ludwig
Email: send email
Telefono: 493031000000
Fax: 493031000000

DE (BERLIN) coordinator 155˙542.40

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

effect    data    computer    acoustic    synchronised    asynchrony    audio    score    technologies    vision    learning    first    asynchronous    intermodal    input    humans    synthesised    time    responses    initial    basis    scores    audiovisual    automatically    evaluation    subjective    levels    synchrony    prediction    perceived    assessing    automatic    perceptual    video    techniques    machines    quality    model    communication    human    visual    machine    re    speech   

 Obiettivo del progetto (Objective)

'In recent years, there has been a marked increase in communication technologies and computer interfaces that operate within the audio-visual speech domain, (e.g. video-telephony, synthesised avatars, etc). Faithful synchrony between the visual and acoustic speech elements of such technologies is of great importance in ensuring that they are perceived by end-users as operating at high and optimal quality levels. The effect of intermodal asynchrony on user-perceived quality is typically assessed using subjective evaluation techniques. A system for automatically assessing asynchrony levels, and predicting quality degradation on that basis, would therefore be both desirable and useful, and will have direct application to techniques for automatic synchrony adjustment. The proposed project will examine audio-visual speech as both spoken naturally by humans and as artificially synthesised by machines, and will employ subjective assessment techniques and machine learning in a combined iterative semi-automatic strategy for producing a Quality Prediction Model. Different levels of intermodal asynchrony will first be assessed by human subjects, who will be required to score the effect of the asynchrony levels on perceived speech quality using standardised techniques that will be modified for use with multimodal speech. Asynchrony patterns and their corresponding subjective assessment scores will be automatically learned by machines, resulting in an initial Quality Prediction Model. The initial model will be tested using data that will be simultaneously assessed by humans, using the subjective assessment techniques, above. The output from the prediction model will be directly compared with the subjective scores, providing an initial evaluation of the model's performance. The model will be adjusted on this basis, and re-trained using new data. The process of re-train, re-test, re-score, will be repeated iteratively, leading to a more robust quality prediction model.'

Introduzione (Teaser)

When video lags behind speech, the effect can be discouraging for technology users. New research into effectively measuring asynchronous audiovisual signals could help address this phenomenon.

Descrizione progetto (Article)

High-tech audiovisual communication is becoming a very common form of exchange, from teleconferencing by satellite to live chatting through smart phones. As simple as the idea seems, matching the picture and voice together in a synchronised manner is challenging yet crucial for the success of such complex applications. If users perceive that the communication experience is not synchronised, they could switch to other means of communication.

Against this backdrop, the EU-funded project PERCQUALAVS aimed to measure synchrony between the visual and acoustic speech elements of such technologies. Building on fields such as computer vision, cognitive science, machine learning and speech processing, it conceived a model to predict the perceived quality of audiovisual speech.

To achieve its aims, the project was divided into four parts. The first looked at extracting key audiovisual features from an input signal to apply automatic asynchrony detection. The second involved gathering subjective perceptual response data through several perceptual experiments.

The third component analysed perceptual responses gathered, while the fourth represented a machine learning component that predicts human perception of asynchronous input. The project team successfully developed computer vision-based feature extractors that track the lips in real time and extract valuable data, creating as well speech processing toolkits to assist in analysing the data.

Another key project achievement was the development of software to process the extracted features in order to measure synchrony and map the results. This enabled comparison between perceptual responses of users and automatically generated results.

While the project's results were not disseminated adequately due to different technical and time constraints, they have laid the groundwork for more research in the field. This is a step forward for assessing and improving audiovisual technology, which is growing rapidly worldwide.

Altri progetti dello stesso programma (FP7-PEOPLE)

TRANSPORTER FUNCTION (2014)

Mass spectrometry of structural dynamics in secondary membrane transporters

Read More  

DOUBLELICHT (2011)

Double-site ligands for the inhibition of Cholera toxin

Read More  

BEYONDSOVEREIGNTY (2012)

Beyond Sovereignty: Delegation and Agency in International Security

Read More