#	Pagina
attuale pagina	/open-fp7/projects/93820/index.html

TTC

Terminology extraction, translation tools and comparable corpora

Coordinatore	UNIVERSITE DE NANTES Organization address address: quai de Tourville 1 city: Nantes postcode: 44035 contact info Titolo: Ms. Nome: Pauline Cognome: BOUDANT Email: send email Telefono: -40998462 Fax: -40998381
Nazionalità Coordinatore	France [FR]
Totale costo	2˙663˙099 €
EC contributo	2˙025˙000 €
Programma	FP7-ICT Specific Programme "Cooperation": Information and communication technologies
Code Call	FP7-ICT-2009-4
Funding Scheme	CP
Anno di inizio	2010
Periodo (anno-mese-giorno)	2010-01-01 - 2012-12-31

Partecipanti

#	participant	country	role
1	UNIVERSITE DE NANTES Organization address address: quai de Tourville 1 city: Nantes postcode: 44035 contact info Titolo: Ms. Nome: Pauline Cognome: BOUDANT Email: send email Telefono: -40998462 Fax: -40998381	FR (Nantes)	coordinator
2	EURINNOV SARL Organization address address: Rue Jean Goujon city: Paris postcode: 75008 contact info Titolo: Mr. Nome: Matthieu Cognome: Rolland Email: send email Telefono: -215 Fax: -244	FR (Paris)	participant
3	SOGITEC INDUSTRIES SA Organization address address: Rue Marcel Monge city: Suresnes postcode: 92158 contact info Titolo: Mr. Nome: Claude Cognome: Méchoulam Email: send email Telefono: -166 Fax: -166	FR (Suresnes)	participant
4	SYLLABS SARL Organization address address: RUE JEAN BAPTISTE BERLIER - PEPINIERE MASSENA city: PARIS 13 postcode: 75013 contact info Titolo: Ms. Nome: Helena Cognome: Blancafort Email: send email Telefono: -172 Fax: -177	FR (PARIS 13)	participant
5	TILDE SIA Organization address address: VIENIBAS GATVE city: RIGA postcode: 1004 contact info Titolo: Mr. Nome: Aivars Cognome: Berzins Email: send email Telefono: -67604630 Fax: -67605379	LV (RIGA)	participant
6	UNIVERSITAET STUTTGART Organization address address: Keplerstrasse city: STUTTGART postcode: 70174 contact info Titolo: Dr. Nome: Ulrich Cognome: Heid Email: send email Telefono: -82720 Fax: -82713	DE (STUTTGART)	participant
7	UNIVERSITY OF LEEDS Organization address address: Woodhouse Lane city: LEEDS postcode: LS2 9JT contact info Titolo: Dr. Nome: Serge Cognome: Sharoff Email: send email Telefono: -7699 Fax: -3699	UK (LEEDS)	participant

Mappa

Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

contextual alignment language word machine export automatic automatically bilingual generating extraction translation tools web create corpus platform nthe monolingual languages strategies ttc corpora domain lexical topical terminology terminologies

Obiettivo del progetto (Objective)

The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) aims at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian), as well as in Chinese and Russian.nComparable corpora gather sets of texts corresponding to a same domain, but not necessary being a translation from each other.nThe main steps for automatically generating bilingual terminologies are the automatic extraction of monolingual terminologies and the bilingual alignment of the extracted terminologies. The terminologies will include single word terms (SWT) and multi-word terms (MWT), as well as their variations.nThe TTC project will develop generic methods and tools for automatic extraction of terminologies and alignment algorithms including adaptors to domains and languages, in order to break the lexical acquisition bottleneck in both statistical and rule-based machine translation. Alignment will be based on several strategies, i.e. lexical strategies (use of compositional methods and of an interlingua representation), contextual strategies (use of cognates, context vectors and labelled links) and corpora strategies (bettering of available corpora, for instance by topical web crawling). Developed methods will require as less prior linguistic knowledge as possible, so as to reduce the gaps in language coverage.nIt will also develop or adapt tools for gathering and managing these comparable corpora and for managing terminologies. In particular, a topical web crawler and an open terminology platform will be developed. This open terminology platform will support tasks such as terminology storage, search, editing and export.nThe TTC project will integrate developed and existing tools in an online platform, which will be based on Web Services and will use reputable open solutions such as UIMA (Unstructured Information Management Architecture ) and EuroTermBank . Existing tools to be integrated in the platform consist of already developed GPL term extraction tools, a framework for contextual analysis, as well as TreeTagger versions, tokenisers and POS taggers for several languages. The platform will allow users to create thematic corpora given some clues (such as terms or documents on a specific domain), to extract monolingual terminology from such corpora, to create a comparable corpus in a target language from a corpus in a source language, to align bilingual terminologies, to choose the tools to apply for terminology extraction, to expand a given corpus and to export monolingual or bilingual terminologies in order to use them easily in automatic and semi-automatic translation tools.