#	Pagina
attuale pagina	/open-h2020/projects/211546/results.html
-1	/open-h2020/projects/220176/results.html
-2	/open-h2020/projects/211055/results.html

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - WIDE (Wide Incremental learning with Discrimination nEtworks)

Teaser

Summary

In linguistics, it is widely assumed that a mental calculus is foundational to language, with an alphabet of elementary symbols interlacing with rules to define well-formed sequences of symbols. This calculus is usually believed to operate at two distinct levels, the level of phonology and the level of morphology and syntax. The phonological alphabet consists of letter-like units of sound called phonemes. Strings of phonemes build the atomic meaningful units of the language, known as morphemes. Rules and constraints define which sequences of phonemes can form legal morphemes. These morphemes in turn comprise the alphabet of a second calculus, with morphological and syntactic rules defining the legal sequences of morphemes (and thus the words and sentences of a language).

In this project, we are investigating whether the comprehension and production of words truly requires sub-word units such as phonemes and morphemes. The realization of phonemes is known to vary tremendously with the context in which they occur. For distinguishing a \'p\' from a \'t\' or a \'k\', changes in the first and second formants of adjacent vowels are crucial. Furthermore, the theoretical construct of the morpheme, as the smallest linguistic sign, is perhaps attractive for agglutinating languages such as Turkish, but is not helpful at all for understanding the structure of words in fusional languages such as Latin. The central hypothesis under research in this project is that the relation between words\' forms and their meanings can be modeled computationally in an insightful and cognitively valid way without using the theoretically problematic constructs of the phoneme and the morpheme.

The first goal of the WIDE project is to show that indeed the relation between words\' forms and meanings can be computationally modeled without relying on phonemes and morphemes. In other words, we aim to develop a computational implementation of Word and Paradigm Morphology (a theory developed by the Cambridge linguists Matthews and Blevins) that provides, at the functional level, a cognitively valid characterization of the comprehension and production of complex words.

The second goal of the WIDE project is to clarify how much progress can be made with, and what the limits are of, wide learning networks, i.e., networks with very large numbers of input and output nodes, but no hidden layers. The mathematics of these networks are well understood. From a statistical perspective, wide learning is related to multivariate multiple regression. In this respect, wide learning differs from deep learning. Deep learning networks, however impressive their performance, are still largely black boxes when it comes to understanding why they work, and how exactly they work for a particular problem.

The primary motivation for this project is to better understand the cognition of language and language processing, and thus, as researchers within the humanities, to obtain a better understanding of ourselves as human beings with their strengths and failures, as comparted to artificial intelligence. Aspects of this research may contribute somewhat more concretely to society in two ways. First, some of the cognitively motivated algorithms we are developing may help enhance speech recognition systems and natural language processing. Second, our work on discriminative incremental learning may contribute insights to research on ageing and cognition, as well as second language acquisition.

Work performed

Key results in the first phase of the WIDE project are as follows.

Building on previous work on speech comprehension, we have been studying auditory word recognition with wide learning. Wide learning networks trained with low-level acoustic features extracted from the audio signal of words occurring in corpora of spontaneous speech, using the vast repository of multi-model TV news broadcasts of the Distributed Little Red Hen Lab, perform surprisingly well (Shafaei-Bajestan & Baayen, 2018), and recent implementations outperform deep learning networks on the task of isolated word recognition by a factor of two. Deep learning networks, however, are amazingly good at recognition of words in continuous speech, and an important challenge for the remainder of this project is to show that wide learning can also be made to work for continuous speech.

We have been, and are continuing to work on, developing a cognitively motivated model of the mental lexicon. It comprises several linked wide networks that jointly provide a mathematical functional characterization of the comprehension and production of words. The model can handle not only simple words, but also inflected and derived words. It does so without making use of units representing morphemes, a theoretical construct in traditional linguistics for the supposedly smallest form unit with its own meaning. Recent developments in theoretical morphology have shown that this theoretical construct is ill-conceived. The study of Baayen, Chuang, Shafaei-Bajestan and Blevins (2019) is the first to show that complex words can not only be understood without morphemes, but also produced without morphemes. This is achieved by treating complex words in the same way as simple words at the level of form, while building up their semantic vectors analytically, through summation of the semantic vectors of the (sense-disambiguated) base word and the semantic vectors of the (sense-disambiguated) functions realized by inflectional or derivational exponents. We have been able to show for English that several measures derived from the model\'s networks are predictive for a range of aspects of human lexical processing.

The algorithms that form the core of this model are being implemented in a package for the R programming environment. A first version of this package is available at http://www.sfs.uni-tuebingen.de/~hbaayen/publications/WpmWithLdl_1.0.tar.gz. A companion paper for this version of the package, Baayen, Chuang, and Blevins (2018), provides a proof of concept that Latin verbs can be produced and understood without having to design inflectional classes, and without having to define stems, affixes, and their allomorphs.

A challenge for any model for speech production is predicting the fine phonetic detail with which words are realized. Our approach, which is grounded in discrimination learning, has been quite succesful for predicting segment duration in English (Tomaschek et al. (2019). However, many words show substantial variation with respect to which phones are actually articulated. Thus, in conversational German, \'wuerden\' (they became) is often realized as \'wuen\' instead of \'wuerdn\', in spontaneous Dutch, \'natuurlijk\' (\'of course\') reduces to \'tuuk\', English \'hilarious\' becomes \'hlÎµrÉ™s,\', and in Mandarin informal speech, all that may be left of the three-syllable word \'è¦ä¸ç„¶\' (\'jaÊŠpuÊan\', or, otherwise) is ÊŠÉª. A statistical survey of the conditions under which segments are deleted is reported in Linke & Baayen (2019).

As our computational model links words\' forms to semantic vectors representing their meanings, we have started exploring nonword processing. So far two studies have been conducted. The first study looked into children\'s lexical acquistion. The simulation results of Cassani et al. (2019) demonstrate that whether newly encountered words are nouns or verbs can be learned based on the estimated meanings predicted from their word forms. Model predictions line up

Final results

Central results that progress beyond current standards are (1) the computational implementation of a model for the mental lexicon that combines both comprehension and production, without requiring morphemes; and (2) our current results on single word recognition where we outperform state-of-the-art algorithms from deep learning. We are working on refining our computational model, testing it on a variety of languages ranging from Turkish and Russian to Biblical Hebrew, and developing modality-specific representations and algorithms for auditory and visual input, as well as for the articulation of speech. Of crucial importance will be extending the model to the comprehension and production of words in the context of other words.