LOWLANDS

Parsing low-resource languages and domains

 Coordinatore KOBENHAVNS UNIVERSITET 

Spiacenti, non ci sono informazioni su questo coordinatore. Contattare Fabio per maggiori infomrazioni, grazie.

 Nazionalità Coordinatore Denmark [DK]
 Totale costo 1˙126˙183 €
 EC contributo 1˙126˙183 €
 Programma FP7-IDEAS-ERC
Specific programme: "Ideas" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call ERC-2012-StG_20111124
 Funding Scheme ERC-SG
 Anno di inizio 2013
 Periodo (anno-mese-giorno) 2013-01-01   -   2017-12-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1 KOBENHAVNS UNIVERSITET DK hostInstitution 1˙126˙183.20
2 KOBENHAVNS UNIVERSITET DK hostInstitution 1˙126˙183.20

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

natural    accurate    blogs    cross    newswire    collections    data    english    unsupervised    languages    articles    manually    bias    newspaper    rely    learning    syntactic    automatically    annotated    societies    modern    korean    micro    scalable    language    parsing    translate    techniques    nlp    summarizing    problem   

 Obiettivo del progetto (Objective)

'There are noticeable asymmetries in availability of high-quality natural language processing (NLP). We can adequately summarize English newspapers and translate them into Korean, but we cannot translate Korean newspaper articles into English, and summarizing micro-blogs is much more difficult than summarizing newspaper articles. This is a fundamental problem for modern societies, their development and democracy, as well as perhaps the most important research problem in NLP right now.

Most NLP technologies rely on highly accurate syntactic parsing. Reliable parsing models can be induced from large collections of manually annotated data, but such collections are typically limited to sampled newswire in major languages. Highly accurate parsing is therefore not available for other languages and other domains.

The NLP community is well aware of this problem, but unsupervised techniques that do not rely on manually annotated data cannot be used for real-world applications, where highly accurate parsing is needed, and sample bias correction methods that automatically correct the bias in newswire when parsing, say, micro-blogs, do not yet lead to robust improvements across the board.

The objective of this project is to develop new learning methods for parsing natural language for which no unbiased labeled data exists. In order to do so, we need to fundamentally rethink the unsupervised parsing problem, including how we evaluate unsupervised parsers, but we also need to supplement unsupervised learning techniques with robust methods for automatically correcting sample selection biases in related data. Such methods will be applicable to both cross-domain and cross-language syntactic parsing and will pave the way toward robust and scalable NLP. The societal impact of robust and scalable NLP is unforeseeable and comparable to how efficient information retrieval techniques have revolutionized modern societies.'

Altri progetti dello stesso programma (FP7-IDEAS-ERC)

VIDEOWORLD (2011)

"Modeling, interpreting and manipulating digital video"

Read More  

STEPS (2013)

Signalling compartmentalization and vesicle Trafficking at the Phagocytic Synapses

Read More  

TEAMCONTROL (2011)

Self-Control and the Person: An Inter-Disciplinary Account

Read More