Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 3 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)

Teaser

The deployment of decision and negotiation support tools in current ATM requires a strong adaption to the local environment. Every single process of adaptation yields a significant cost increase for a core ATM system so that total system costs easily exceed the threshold of...

Summary

The deployment of decision and negotiation support tools in current ATM requires a strong adaption to the local environment. Every single process of adaptation yields a significant cost increase for a core ATM system so that total system costs easily exceed the threshold of one million Euros.

MALORCA main objective, therefore, is to develop machine learning tools to automatically learn controller behaviour and speech recognition models from data, recorded day by day by the Air Navigations Service Providers (ANSPs). This will replace much of manual adaptation effort and reducing costs needed by standard deployment. Cheaper tools enhance market penetration of controller assistance tools, which will enable increased capacity, less fuel burn and reduced C02 emissions.

Applying machine learning to adaptation of automatic speech recognition is a first show case. In Air Traffic Control instructions are usually still given via voice communication to the pilots. But modern computer systems in Air Traffic Control, to be safe and efficient, need up-to-date data. Therefore, it requires lots of inputs from the air traffic controllers (ATCOs), which are done today via mouse, to keep the system data correct. Modern technologies like Air-Ground data link, which in some cases can replace the voice communication, will require even more inputs from the ATCOs.

This generates workload for the ATCO, which Speech Recognition Technology will be able to reduce significantly. Simulations have shown that the usage of modern speech recognition will result in an increased sector- and landing-capacity. And furthermore, this will lead to reduced flight time which will lower the airlines costs and has positive environmental impact, because it will save 50 to 65 litres of fuel consumption per flight. For a medium airport with 500 landings per day this can result in more than 23 million kilograms of C02 savings. Speech Recognition Technology today has reached a level of reliability that is sufficient for implementation into an ATM-system. This became obvious from the perspective of Air Navigation Service Provider when supporting trials in course of the AcListant® project. ANS CR, Austro Control, Croatia Control, DFS, Irish Aviation Authority, Naviair, and LFV already participated at least with one controller in experiment with AcListant® Assistant Based Speech Recognizer in DLR’s labs.

One main issue to transfer Speech recognition from the laboratory to the operational systems are the costs of deployment. Currently, modern models of speech recognition require manual adaptation to a local environment. MALORCA project proposes a general, cheap and effective solution to automate this re-learning, adaptation and customisation process. So MALORCA gives industry a practical way of development and deployment of this state-of-the art speech recognition system and its integration in today’s voice communication systems of air navigation service providers.

Work performed

Basic machine learning algorithms were developed for learning acoustic, language and command prediction models. 18 hours of untranscribed and 4 hours of transcribed controller utterances available are available for both test sites Prague and Vienna. Currently command recognition rates of 89% for Prague resp. 61% for Vienna are achieved. The difference is mostly related to lower quality in manual command transcription and noisy recording conditions. Without using untranscribed training data for model improvement only 62% of recognition rate were achieved for Prague. At the end of the project command recognition rates of 92% and 82% are realistic with command recognition error rates of 2% respectively 4%.

In the first year of the project a basic Arrival Manager was developed for Vienna and Prague which enables to predict command hypotheses for each controller command spoken to the pilot. The speech data were transcribed (speech-to-text) and annotated (text-to-relevant concepts, e.g. call sign, command type, command value; greetings and other information elements which are not relevant for input into radar labels (e.g. weather information) are not considered). An Operational Concept Document was created which clearly specifies controllers\' preferences to benefit from applying speech recognition in air traffic management. The Operational Concept Document together with the annotated speech data provided an input for creating the System Requirement Specification. A basic recognition system has been implemented to be used in the following reporting periods for developing and testing the automatic learning algorithms. Compared to Google’s data bases incorporating more than 300,000 hours of speech data MALORCA 22 hours for each test site are only a drop in the bucket. However, for each automatically transcribed utterance the output of the Arrival Manager is available, resulting in a limited set of possible commands in each situation. This has helped to classify the automatically transcribed utterances into good and bad transcriptions.

Final results

Algorithms trained by machine learning should be accurate, efficient and introspective, i.e. they should ask for additional information if they are not sure, i.e. a confidence measure is needed. Deep architectures are efficient with respect to runtime, can generalise well, and provide state-of-the-art performance, provided that sufficient amount of training data is available. Estimation of confidence scores related to reliability of the classification output is still a challenging task. We expect high confidence if recognition is correct and low confidence if recognition is false. MALORCA developed a very novel approach, which was not applied before in Automatic Speech Recognition domain since MALORCA’s algorithms can rely on two independent information sources : (1) acoustic scores are combined with (2) scores extracted from command hypotheses prediction. For each automatically transcribed utterance, the output of the Arrival Manager is available, resulting in a limited set of possible commands in each situation.


AcListant® (www.aclistant.de), which developed, validated and quantified benefits of Assistant Based Speech Recognition has always exploited speech and radar data from the simulator. MALORCA project is built around the radar and speech data obtained directly from recordings from the ops room in Prague and Vienna. AcListant® had also direct access to the speech signal during the simulations and, therefore was able to capture speech data that was recorded directly from the microphone. In MALORCA, however, there was no possibility to record speech data in a similar fashion given the strict safety policies of the ops-room. Instead, we agreed to use radio transmission speech data which is recorded for archiving and incidents feedback (not generally intended to be used by another system. Hence, the quality of speech (i.e. significant drop of SNR) has significantly decreased which has a direct impact of resulting performance.
Therefore, several new challenges were tackled during the previous reporting periods
- 8 kHz sampling rate, instead 16 kHz
- very noisy speech environment (i.e. low speech to noise ratio)
For further details see last summary.

Website & more info

More info: http://www.malorca-project.de/.