Coordinatore | UNIVERSITAT POLITECNICA DE CATALUNYA
Organization address
address: Jordi Girona 31 contact info |
Nazionalità Coordinatore | Spain [ES] |
Totale costo | 173˙212 € |
EC contributo | 173˙212 € |
Programma | FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013) |
Code Call | FP7-PEOPLE-2011-IOF |
Funding Scheme | MC-IOF |
Anno di inizio | 2012 |
Periodo (anno-mese-giorno) | 2012-12-07 - 2016-09-05 |
# | ||||
---|---|---|---|---|
1 |
UNIVERSITAT POLITECNICA DE CATALUNYA
Organization address
address: Jordi Girona 31 contact info |
ES (BARCELONA) | coordinator | 173˙212.80 |
Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.
'Machine Translation (MT) is a highly interdisciplinary and multidisciplinary field since it is approached from the point of view of engineering, computer science, informatics, statistics and linguists. Unfortunately, the cooperation and interaction among these fields in relation to MT technologies is still very low. The goal of this research project is to approach the different profiles in the MT community by providing a new integrated MT paradigm which mainly includes linguistic technologies and statistical algorithms.
Basically, our research will be focused on the problem of dynamically integrating the two most popular MT paradigms: the rule-based and the statistical-based. We will use linguistic technologies developed either for the rule-based MT systems or other natural language processing tasks into statistical MT systems. Linguistic technologies include: bilingual dictionaries, transfer rules, statistical parsing, word sense disambiguation, morphological and syntactic analysis. The new paradigm will provide solutions to current MT challenges such as unknown words, reordering and semantic ambiguities.
The project will focus on the three most spoken languages in the world: Chinese, Spanish and English; and all translation combinations among them. These language pairs do not only involve many economical and cultural interests, but they also include some of the most relevant MT challenges such as morphological, syntactic and semantic variations.'
A tool that translates language in real time would have an enormous pay-off for society. EU-funded scientists are proposing an advanced machine translation (MT) paradigm to further enhance the quality of translated texts.
MT is a highly interdisciplinary and multidisciplinary field requiring input from professionals ranging from translators to engineers to computer scientists to mathematicians to linguists. The 'Integration of machine translation paradigms' (IMTRAP) project is working on developing and validating an open-source hybrid MT system.
Researchers will focus on multiple aspects of linguistics such as morphology, syntax and semantics. This cutting-edge hybrid system will combine different MT paradigms, including statistical and rule-based MT (RBMT) and should be trainable in any pair of languages.
Researchers successfully introduced baseline statistical MT systems for Chinese-to-Spanish and English-to-Spanish through a collection of corpora for these pairs of languages.
Another important achievement of IMTRAP was the development of the first Chinese-to-Spanish open-source hybrid system. The input of this system was pre-processed with an RBMT system and its output was passed to a statistical MT (SMT) system. SMT used models whose parameters stemmed from the analysis of monolingual and bilingual corpora. RBMT was used to define the structural transfer rules for phrases, and SMT was considered as the only source for the lexical transfer of words. By using SMT techniques, notable enhancements were observed in the final output of translation.
Furthermore, the output of this new hybrid system has been contrasted with a state-of-the-art SMT system in the out-of-domain test set.
Results showed that the new RBMT system outperforms the SMT system in all linguistic levels except the syntax level. Specifically, the new hybrid system far outperformed the state-of-the-art in terms of lexical coverage.
IMTRAP is working towards achieving a higher level of hybridisation in statistical and RBMT. Future work will also focus on extracting transfer rules, assigning a probability to a sequence of n words, as well as introducing a language model to the generation step. Successful development of a cost-effective hybrid MT system will have wide-ranging applications in information access systems and document translations.
Lead Optimisation of Novel Androgen Receptor Small Molecule Modulators - Improving Treatment of Prostate Cancer
Read More