Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - L2STAT (Statistical learning and L2 literacy acquisition: Towards a neurobiological theory of assimilating novel writing systems)

Teaser

The integration of non-native populations into society, and especially into the workforce, is dependent upon the learning of a new language, and mostly, the acquisition of functional literacy. Indeed, a long-term objective of the EU commission is that every citizen achieves...

Summary

The integration of non-native populations into society, and especially into the workforce, is dependent upon the learning of a new language, and mostly, the acquisition of functional literacy. Indeed, a long-term objective of the EU commission is that every citizen achieves literacy in at least two foreign languages. How do proficient readers in one language learn a novel writing system and achieve literacy in a second or third language? How does their native language (L1) affect learning of L2? What are the regularities that they perceive and learn? What are the main neurocognitive mechanisms governing this learning of regularities? What is their neurobiological basis? Why is this task relatively easy for some but not for others? These intriguing and complex questions are the focus of L2STAT. They lie at the intersection of (a) fundamental questions about the human brain and (b) pressing education policy questions directed at creating more inclusive societies with less linguistically-driven social marginalization. The aim of L2STAT is, therefore, to produce and test a neurobiological theory of assimilating novel writing systems by the brain, a theory that could accommodate in principle any writing system, and has, therefore, wide explanatory power. L2STAT is an interdisciplinary project that requires employing in parallel advanced methods from computational linguistics and machine learning, the use of biologically-inspired computational models, developing psychometrically reliable behavioral tests of individuals’ capacities to extract regularities, finding reliable neurobiological signatures of detecting regularities in the human brain, and conduct behavioral experimentation in four sites (Israel, Spain, Taiwan, USA) to track literacy acquisition longitudinally in the four different languages.

Work performed

WP1- Through extensive work with postdoc Raquel Garrido Alhama at the BCBL, we assembled a database consisting of a large set of Wikipedia entries in English, Spanish and Hebrew. After downloading the full Wikipedia dumps the database was cleaned by removing links, headers, references, formulas, and all entries shorter than 200 words. Our cleaned database included only main article texts, and additional pre-processing involved removing punctuation, numbers, lowercasing, filtering out text in foreign alphabet, etc. For comparative purposes we also sampled words from the Open Subtitle (OPUS) corpus, that is already available, and measured the correlations of various measures obtained through the different corpora (r=.77). The assembled corpora were then used to examine cross-linguistic differences in the statistical distribution of information in Hebrew vs. English at the level of letters per the planned research program.

WP2- In collaboration with Blair Armstrong (originally, my post-doc at the postdoc at the BCBL, now at University of Toronto), we explored how a single computational framework could subsume the predictions and computational mechanics of SL. A first simulation of typical statistical learning tasks involved learning the regularities in small simple sets of stimuli. In particular, we tasked the model with learning to (a) encode the item currently being presented to it, while also (b) remembering the previous items it had seen and (c) predicting which item comes next, all using a single pool of neurons that mediated between the input for the current item and the encoding, memory, and prediction outputs. A second research effort by Raquel Garrido Alhama and Blair Armstrong has led to producing through a neural network modeling framework a computational model that learned to identify words when simulating fixating at different locations in that word.

WP3- we adopted a novel perspective for investigating the processing of regularities in the visual modality, by tracking online performance in a self-paced SL paradigm. This allowed us to focus on the trajectory of learning. We demonstrated that this paradigm provides a reliable and valid signature of SL performance, and offers important insights for understanding how statistical regularities are perceived and assimilated in the visual modality when learning proceeds. To further examine the psychometric properties of a sequence learning task, we tested the test-retest reliability of the well-established Hebb-repetition task. Our findings showed that retest reliability of individual learning performance in the Hebb task was close to zero. In a third research effort we centered on how prior linguistic knowledge regarding speech co-occurrences of a native language, impacts what participants learn from novel auditory verbal input. We showed that auditory-verbal tasks display distinct item-specific effects given entrenchment due to linguistic environment. Predicting what regularities will be learned in an auditory stream.

WP4- For this work-package we explored a range of neurobiological signatures of learning during visual SL. This serves two main aims. First, methodologically, to offer an independent online measure that regularities have been extracted. Second, theoretically, to advance towards a deeper understanding of the mechanisms underlying prediction of structured inputs. The first experimental work examined ERP, patterns of neural oscillation in EEG, eye-blinks, and pupil size during a visual SL task, where participants are presented with a sequence of triplets of shapes in 24 blocks of presentation. Our aim was to find neurobiological signatures of prediction when learning proceeds. This project was conducted by postdoc Louisa Bogaerts in collaboration with postdoc Craig Richter in the BCBL. The first EEG study (40 participants) has been completed and it will serve as pilot for designing the proposed MEG study to be conducted soon in the BCBL, Spain. Results of

Final results

L2STAT has so far developed SL tasks that measures learning while it proceeds. The tasks for tracking learning in the visual modality as developed in WP3 are now acknowledged by the scientific community as a valid tool for tracking statistical learning. These tasks are transferred as free resource to all labs interested in visual statistical learning for use in laboratory experiments.
On the theoretical level, L2STAT has developed a novel Information Theory of Reading, which will be, I believe, a game change in understanding reading in different writing systems.
The expected results until the end of the project is to converge on a range of neurobiological measures of perceiving regularities, extension of the computational linguistic work to include Chinese as well, and publishing a cross-linguistic computational model of reading that follows information theory.