IMTRAP

Integration of Machine Translation Paradigms

 Coordinatore UNIVERSITAT POLITECNICA DE CATALUNYA 

 Organization address address: Jordi Girona 31
city: BARCELONA
postcode: 8034

contact info
Titolo: Mr.
Nome: Carlos
Cognome: Laffitte
Email: send email
Telefono: 34934017126

 Nazionalità Coordinatore Spain [ES]
 Totale costo 173˙212 €
 EC contributo 173˙212 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2011-IOF
 Funding Scheme MC-IOF
 Anno di inizio 2012
 Periodo (anno-mese-giorno) 2012-12-07   -   2016-09-05

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    UNIVERSITAT POLITECNICA DE CATALUNYA

 Organization address address: Jordi Girona 31
city: BARCELONA
postcode: 8034

contact info
Titolo: Mr.
Nome: Carlos
Cognome: Laffitte
Email: send email
Telefono: 34934017126

ES (BARCELONA) coordinator 173˙212.80

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

paradigm    bilingual    lexical    rbmt    spanish    mt    chinese    smt    words    statistical    source    transfer    syntax    linguists    hybrid    linguistic    semantic    interdisciplinary    syntactic    rule    translation    rules    pairs    paradigms    corpora    output    technologies    languages    scientists    english    multidisciplinary    language    researchers    morphological    input    computer    ranging    imtrap    machine   

 Obiettivo del progetto (Objective)

'Machine Translation (MT) is a highly interdisciplinary and multidisciplinary field since it is approached from the point of view of engineering, computer science, informatics, statistics and linguists. Unfortunately, the cooperation and interaction among these fields in relation to MT technologies is still very low. The goal of this research project is to approach the different profiles in the MT community by providing a new integrated MT paradigm which mainly includes linguistic technologies and statistical algorithms.

Basically, our research will be focused on the problem of dynamically integrating the two most popular MT paradigms: the rule-based and the statistical-based. We will use linguistic technologies developed either for the rule-based MT systems or other natural language processing tasks into statistical MT systems. Linguistic technologies include: bilingual dictionaries, transfer rules, statistical parsing, word sense disambiguation, morphological and syntactic analysis. The new paradigm will provide solutions to current MT challenges such as unknown words, reordering and semantic ambiguities.

The project will focus on the three most spoken languages in the world: Chinese, Spanish and English; and all translation combinations among them. These language pairs do not only involve many economical and cultural interests, but they also include some of the most relevant MT challenges such as morphological, syntactic and semantic variations.'

Introduzione (Teaser)

A tool that translates language in real time would have an enormous pay-off for society. EU-funded scientists are proposing an advanced machine translation (MT) paradigm to further enhance the quality of translated texts.

Descrizione progetto (Article)

MT is a highly interdisciplinary and multidisciplinary field requiring input from professionals ranging from translators to engineers to computer scientists to mathematicians to linguists. The 'Integration of machine translation paradigms' (IMTRAP) project is working on developing and validating an open-source hybrid MT system.

Researchers will focus on multiple aspects of linguistics such as morphology, syntax and semantics. This cutting-edge hybrid system will combine different MT paradigms, including statistical and rule-based MT (RBMT) and should be trainable in any pair of languages.

Researchers successfully introduced baseline statistical MT systems for Chinese-to-Spanish and English-to-Spanish through a collection of corpora for these pairs of languages.

Another important achievement of IMTRAP was the development of the first Chinese-to-Spanish open-source hybrid system. The input of this system was pre-processed with an RBMT system and its output was passed to a statistical MT (SMT) system. SMT used models whose parameters stemmed from the analysis of monolingual and bilingual corpora. RBMT was used to define the structural transfer rules for phrases, and SMT was considered as the only source for the lexical transfer of words. By using SMT techniques, notable enhancements were observed in the final output of translation.

Furthermore, the output of this new hybrid system has been contrasted with a state-of-the-art SMT system in the out-of-domain test set.

Results showed that the new RBMT system outperforms the SMT system in all linguistic levels except the syntax level. Specifically, the new hybrid system far outperformed the state-of-the-art in terms of lexical coverage.

IMTRAP is working towards achieving a higher level of hybridisation in statistical and RBMT. Future work will also focus on extracting transfer rules, assigning a probability to a sequence of n words, as well as introducing a language model to the generation step. Successful development of a cost-effective hybrid MT system will have wide-ranging applications in information access systems and document translations.

Altri progetti dello stesso programma (FP7-PEOPLE)

FBMC2010MDG (2011)

Lead Optimisation of Novel Androgen Receptor Small Molecule Modulators - Improving Treatment of Prostate Cancer

Read More  

RISK (2013)

Risk-Sensitive Policy Making for Populations

Read More  

HYDROPIT (2015)

Plasticity and adaptation of hydraulic traits to overcome climate change

Read More