Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - CoSaQ (Cognitive Semantics and Quantities)

Teaser

At the heart of the multi-faceted enterprise of formal semantics lies a simple yet powerful conception of meaning based on truth-conditions: one understands a sentence if one knows under which circumstances the sentence is true. This notion has been extremely fruitful...

Summary

At the heart of the multi-faceted enterprise of formal semantics lies a simple yet powerful conception of meaning based on truth-conditions: one understands a sentence if one knows under which circumstances the sentence is true. This notion has been extremely fruitful resulting in a wealth of theoretical insights and practical applications. But to what extent can it also account for human linguistic behavior? The past decade has seen increasing interaction between cognitive science and formal semantics and the emergence of the new field of experimental semantics. One of its main challenges is the traditional normative take on meaning, which makes semantic theories hard to compare with experimental data. The aim of this project is to advance experimental semantics by building computational cognitive models of meaning.

Numerical information plays a central role in communication. We talk about the number of students in a class or the proportion of votes for a particular political party. In this project, we will focus on the linguistic expressions concerning quantities, known as quantifiers. Recent progress in the study of computational constraints on quantifier processing in natural language has laid the groundwork for extending semantic theory with cognitive aspects. In parallel, cognitive science has furthered the study of non-linguistic quantity representations. The project integrates formal models of quantifier semantics with cognitive representations to answer a number of questions in linguistics and cognitive science, e.g., how do learn the meaning of quantifiers? how do we decide the truth-values of quantifier sentences? etc.

Work performed

There are roughly 7000 languages spoken in the world. At first glance, the natural languages of the world exhibit tremendous differences amongst themselves. After all, learning a second language as an adult is not an easy task. Yet, linguistics teaches us that languages also do share tremendous amounts of structure. Thus arises one of the central questions in linguistic theory: What is the range of variation in human languages? That is: which out of all of the logically possible languages that humans could speak, do they in fact speak? A limitation on the range of possible variations will be a property that all (or, at least almost all) languages share. Such a property will be a linguistic universal. Universals have been discovered at all levels of linguistic analysis: phonology, morphology, syntax, and semantics. For example, all languages have consonants and vowels, all have nouns and verbs, etc. Whenever a universal is attested, it is natural to ask for an explanation of its source. Why does the universal hold? Many theorists search for cognitive explanations of universals. Such an explanation would locate the existence of universal in a feature of the human mind with which language must interface. Recently, my group has been developing the hypothesis that universals are to be explained in terms of learnability.

One of the domains we have focused on is color terms. We know that languages differ significantly in what colors they lexicalize. In other words, different languages categorize the color spectrum in different ways. But all languages’ color terms form ’curving-out’, convex, regions of color space. A color category is convex if and only if we select two points within C and draw a line in between them, then all the points on the line will also belong to C. In this sense, nC and nC+ are not convex. Quite opposite, they are concave, see Figure 1a. In order to test the hypothesis that convex color systems are easier to learn than non-convex ones, we generated a large number of artificial color systems within the psychophysiological model of human color perception. The color systems varied in the extent to which they satisfied convexity (intuitively nC is more convex than nC+)1. To measure how learnable each color system is, we trained an artificial computational model, a neural network to learn each system. The accuracy of the neural network learning to name colors was positively correlated with the degree of convexity, see Fig. 2. Therefore, learnability explains convexity universal.

We see similar results in different domains, e.g., modal verbs and quantifiers. The resulting package can be impressionistically described as follows, see Fig. 3, one can imagine a kind of heat map overlaid on the space of possible meanings, with redder shades meaning easier to learn and bluer shades meaning harder to learn. In other words, the claim is, that language learners are attracted to warmer regions on the map. The computational simulations are an argument that individual minds may be better at acquiring meanings satisfying certain universal constraints (attracted to hot regions).

Final results

We have proved that the methods of formals semantics, computational linguistics, and cognitive science can be fruitfully combined to provide a cognitive computational explanation for the universal properties across languages. In the future, we will strengthen this explanation not only by extending it to the broader class of linguistic phenomena but also combining it with the cultural evolution paradigm in order to show how individual learning biases may add up in the language transmission process to affect the language structure. We will also use sophisticated computational modeling to investigate individual differences in semantic representations. This, in turn, will allow building even more cognitively plausible models of meaning representation.

Website & more info

More info: https://www.jakubszymanik.com/CoSaQ/.