ISALTEC

Index based Statistical Analysis of Large Text Corpora

 Coordinatore LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN 

 Organization address address: GESCHWISTER SCHOLL PLATZ 1
city: MUENCHEN
postcode: 80539

contact info
Titolo: Prof.
Nome: Klaus
Cognome: Schulz
Email: send email
Telefono: +49 89 21809700

 Nazionalità Coordinatore Germany [DE]
 Totale costo 161˙968 €
 EC contributo 161˙968 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2013-IEF
 Funding Scheme MC-IEF
 Anno di inizio 2014
 Periodo (anno-mese-giorno) 2014-07-01   -   2016-06-30

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN

 Organization address address: GESCHWISTER SCHOLL PLATZ 1
city: MUENCHEN
postcode: 80539

contact info
Titolo: Prof.
Nome: Klaus
Cognome: Schulz
Email: send email
Telefono: +49 89 21809700

DE (MUENCHEN) coordinator 161˙968.80

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

machine    index    structure    corpora    translation    corpus    representation    phrases    search    graph    gives    methodology    helps    contexts    fundamentally    statistical    contextual   

 Obiettivo del progetto (Objective)

'The statistical analysis of large text corpora is a fundamental method for gaining insights into the structure of language, e.g. for grammar development, machine translation, terminology and named entity extraction, text correction, semantic text analysis, and others. Progress in these fields helps to improve related applications in information science (search engine technology) and many other text oriented disciplines. The core contribution of this project is a new methodology aimed at fundamentally improving statistical analysis of large text corpora. A weakness of current methods in corpus analysis is insufficient use of contextual information. Properly understanding the role, function and meaning of a phrase or word (which is important for many applications, e.g., for translation, search, etc.) is often only possible when taking sentence/paragraph contexts into account. We want to develop and study a new representation of corpora which is superior to present formats in three respects. Most importantly, it offers a much better use of contextual information. At the same time it helps to better distinguish between arbitrary and meaningful parts of text and gives hints on how to compose/decompose phrases. With these properties, the new representation gives a basis for fundamentally improving statistical analysis of corpora. The new representation is derived from a special text index structure which gives immediate access to contexts of any size. The index imposes a natural graph structure on the the phrases in the corpus, which implies that interesting graph-based statistical methods can be applied. Further more it can be efficiently constructed and updated in practice. To practically demonstrate the large potential of the new methodology in NLP we will concentrate on the machine translation where we expect to achieve improved translation methods for words and phrases.'

Altri progetti dello stesso programma (FP7-PEOPLE)

ANION_CAGES (2012)

Dynamic Constitutional Chemistry for the Preparation of Receptors for Anions of Biological Interest

Read More  

AEKSP (2010)

In depth analysis of the role of essential kinases in regulation of meiotic chromosome segregation in Schizosaccharomyces pombe

Read More  

TITOXPATH (2013)

Role of Nano-Titanium Dioxide Immunotoxicity in Infectious Disease paThology

Read More