IMTRAP

Integration of Machine Translation Paradigms

 Coordinatore UNIVERSITAT POLITECNICA DE CATALUNYA 

 Organization address address: Jordi Girona 31
city: BARCELONA
postcode: 8034

contact info
Titolo: Mr.
Nome: Carlos
Cognome: Laffitte
Email: send email
Telefono: 34934017126

 Nazionalità Coordinatore Spain [ES]
 Totale costo 173˙212 €
 EC contributo 173˙212 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2011-IOF
 Funding Scheme MC-IOF
 Anno di inizio 2012
 Periodo (anno-mese-giorno) 2012-12-07   -   2016-09-05

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    UNIVERSITAT POLITECNICA DE CATALUNYA

 Organization address address: Jordi Girona 31
city: BARCELONA
postcode: 8034

contact info
Titolo: Mr.
Nome: Carlos
Cognome: Laffitte
Email: send email
Telefono: 34934017126

ES (BARCELONA) coordinator 173˙212.80

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

statistical    semantic    syntactic    language    multidisciplinary    words    interdisciplinary    translation    rbmt    bilingual    scientists    technologies    computer    languages    hybrid    pairs    syntax    chinese    transfer    corpora    spanish    morphological    rules    machine    rule    linguists    paradigm    output    smt    linguistic    researchers    lexical    mt    english    input    ranging    paradigms    imtrap    source   

 Obiettivo del progetto (Objective)

'Machine Translation (MT) is a highly interdisciplinary and multidisciplinary field since it is approached from the point of view of engineering, computer science, informatics, statistics and linguists. Unfortunately, the cooperation and interaction among these fields in relation to MT technologies is still very low. The goal of this research project is to approach the different profiles in the MT community by providing a new integrated MT paradigm which mainly includes linguistic technologies and statistical algorithms.

Basically, our research will be focused on the problem of dynamically integrating the two most popular MT paradigms: the rule-based and the statistical-based. We will use linguistic technologies developed either for the rule-based MT systems or other natural language processing tasks into statistical MT systems. Linguistic technologies include: bilingual dictionaries, transfer rules, statistical parsing, word sense disambiguation, morphological and syntactic analysis. The new paradigm will provide solutions to current MT challenges such as unknown words, reordering and semantic ambiguities.

The project will focus on the three most spoken languages in the world: Chinese, Spanish and English; and all translation combinations among them. These language pairs do not only involve many economical and cultural interests, but they also include some of the most relevant MT challenges such as morphological, syntactic and semantic variations.'

Introduzione (Teaser)

A tool that translates language in real time would have an enormous pay-off for society. EU-funded scientists are proposing an advanced machine translation (MT) paradigm to further enhance the quality of translated texts.

Descrizione progetto (Article)

MT is a highly interdisciplinary and multidisciplinary field requiring input from professionals ranging from translators to engineers to computer scientists to mathematicians to linguists. The 'Integration of machine translation paradigms' (IMTRAP) project is working on developing and validating an open-source hybrid MT system.

Researchers will focus on multiple aspects of linguistics such as morphology, syntax and semantics. This cutting-edge hybrid system will combine different MT paradigms, including statistical and rule-based MT (RBMT) and should be trainable in any pair of languages.

Researchers successfully introduced baseline statistical MT systems for Chinese-to-Spanish and English-to-Spanish through a collection of corpora for these pairs of languages.

Another important achievement of IMTRAP was the development of the first Chinese-to-Spanish open-source hybrid system. The input of this system was pre-processed with an RBMT system and its output was passed to a statistical MT (SMT) system. SMT used models whose parameters stemmed from the analysis of monolingual and bilingual corpora. RBMT was used to define the structural transfer rules for phrases, and SMT was considered as the only source for the lexical transfer of words. By using SMT techniques, notable enhancements were observed in the final output of translation.

Furthermore, the output of this new hybrid system has been contrasted with a state-of-the-art SMT system in the out-of-domain test set.

Results showed that the new RBMT system outperforms the SMT system in all linguistic levels except the syntax level. Specifically, the new hybrid system far outperformed the state-of-the-art in terms of lexical coverage.

IMTRAP is working towards achieving a higher level of hybridisation in statistical and RBMT. Future work will also focus on extracting transfer rules, assigning a probability to a sequence of n words, as well as introducing a language model to the generation step. Successful development of a cost-effective hybrid MT system will have wide-ranging applications in information access systems and document translations.

Altri progetti dello stesso programma (FP7-PEOPLE)

HYDREX (2010)

"Advancing small-scale hydro-meteorological predictions through mobile X-band dual-polarization radar systems: methods, algorithms and applications"

Read More  

MEDCHANGE (2013)

"Mediterranean changing relationships: global change, networks and border openings"

Read More  

NEPAL (2011)

Non-Equilibrium Processes in Galaxy Clusters

Read More