MULTILEX

Multilingual Lexicon Extraction from Comparable Corpora

 Coordinatore JOHANNES GUTENBERG UNIVERSITAET MAINZ 

 Organization address address: SAARSTRASSE 21
city: MAINZ
postcode: 55099

contact info
Titolo: Dr.
Nome: Sascha
Cognome: Hofmann
Email: send email
Telefono: +49 7274 508 35111
Fax: +49 7274 508 35412

 Nazionalità Coordinatore Germany [DE]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2013-CIG
 Funding Scheme MC-CIG
 Anno di inizio 2014
 Periodo (anno-mese-giorno) 2014-09-01   -   2018-08-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    JOHANNES GUTENBERG UNIVERSITAET MAINZ

 Organization address address: SAARSTRASSE 21
city: MAINZ
postcode: 55099

contact info
Titolo: Dr.
Nome: Sascha
Cognome: Hofmann
Email: send email
Telefono: +49 7274 508 35111
Fax: +49 7274 508 35412

DE (MAINZ) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

of    translated    language    word    multiword    words    human    cross    corpora    or    translations    texts    alignments    languages    parallel    acquisition   

 Obiettivo del progetto (Objective)

'Given large collections of parallel (i.e. translated) texts, it is well-known how to, by successively applying a sentence- and a word-alignment step, establish correspondences between words across languages. However, parallel texts are a scarce resource for most language pairs involving lesser-used languages. On the other hand, human second language acquisition seems not to require the reception of large amounts of translated texts, which indicates that there must be another way of crossing the language barrier. Apparently, the human capabilities are based on looking at comparable resources, i.e. texts or speech on related topics in different languages, which, however, are not translations of each other. Comparable (written or spoken) corpora are far more common than parallel corpora, thus offering the chance to overcome the data acquisition bottleneck. Despite its cognitive motivation, in the proposed project we will not attempt to simulate the complexities of human second language acquisition, but will show that it is possible by purely technical means to automatically extract information on word- and multiword-translations from comparable corpora. The aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new methodology which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between words and multiword-units. 3) Improving the quality of computed word translations by applying an interlingua approach, which, by relying on several pivot languages, allows a highly effective multi-dimensional cross-check. 4) We will show that, by looking at foreign citations, language translations can even be derived from a single monolingual text corpus.'

Altri progetti dello stesso programma (FP7-PEOPLE)

CORELG (2011)

Computation with real Lie Groups

Read More  

BIOMAT4BIOMED (2012)

Development of new biofunctionalized materials for application in regenerative medicine

Read More  

SUSYDM-PHEN (2007)

Supersymmetric Dark Matter and Collider Phenomenology

Read More