PLURELEARN

Plural Reinforcement Learning

 Coordinatore TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY 

 Organization address address: TECHNION CITY - SENATE BUILDING
city: HAIFA
postcode: 32000

contact info
Titolo: Mr.
Nome: Mark
Cognome: Davison
Email: send email
Telefono: +972 4 829 4854
Fax: +972 4 823 2958

 Nazionalità Coordinatore Israel [IL]
 Totale costo 100˙000 €
 EC contributo 100˙000 €
 Programma FP7-PEOPLE
Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)
 Code Call FP7-PEOPLE-2009-RG
 Funding Scheme MC-IRG
 Anno di inizio 2009
 Periodo (anno-mese-giorno) 2009-11-01   -   2013-10-31

 Partecipanti

# participant  country  role  EC contrib. [€] 
1    TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY

 Organization address address: TECHNION CITY - SENATE BUILDING
city: HAIFA
postcode: 32000

contact info
Titolo: Mr.
Nome: Mark
Cognome: Davison
Email: send email
Telefono: +972 4 829 4854
Fax: +972 4 823 2958

IL (HAIFA) coordinator 100˙000.00

Mappa


 Word cloud

Esplora la "nuvola delle parole (Word Cloud) per avere un'idea di massima del progetto.

policy    discovery    dynamic    theory    first    structure    optimisation    representations    combining    environments    team    paradigm    trial    options    learning    simulator    planning    models    teacher    problem    model    reasoning    uncertainty    algorithms    strategies    error    synergetic   

 Obiettivo del progetto (Objective)

'We propose a new paradigm for learning in complex high-dimensions dynamic environments. Our goal is to develop algorithms, theory, and applications that use plurality of learning approaches and models in a synergetic way. Our paradigm considers the task of learning a control policy by combining trial and error in the style of reinforcement learning with learning from a competent teacher whose interaction with the environment can be observed. Instead of using the teacher for imitation, our paradigm is focused on learning good representations of the world-model. We consider four specific issues in the new paradigm: (i) The usage of iteration and reiteration between learning from a teacher and reinforcement learning. (ii) Learning representation and structure from the teacher. (iii) Optimizing policies based on learned representations and reasoning about model uncertainty. (iv) Learning sub-strategies from a teacher and when and how to use them. We will develop algorithms and theory pertaining to the new paradigm and will apply it in two challenging domains: a fighter jet simulator and a network operating center simulator.'

Introduzione (Teaser)

An EU-funded project established a new paradigm for learning in large-scale, dynamic environments associated with elements of uncertainty.

Descrizione progetto (Article)

The overall goal of the project 'Plural reinforcement learning' (PLURELEARN) was to develop algorithms, theory and applications that use a large number of learning approaches and models in a synergetic way.

To realise this goal, the project team identified three objectives: developing a learning approach combining learning from a teacher and learning by trial and error; devising a structure discovery methodology for reasoning about uncertainty in high-dimensional Markov processes; and developing approaches for algorithm selection and mini-strategies.

The team made progress in meeting these objectives. Research on the first objective resulted in papers on how to use a tutor or expert advice in reinforcement learning paradigms. The work showed new algorithms for the problem of learning from multiple sources, as well as how the algorithms work in medium-scale applications.

The problem of structure discovery (objective 2) proved to be quite complex. After developing theoretical and applied aspects of model selection and structure discovery showing the difficulty of detecting dynamic structure, the team developed two approaches for mitigating risks. The first is based on policy gradients and geared toward problems where a simulator is available. The second is based on a robust optimisation approach, where the focus is on a couple of uncertainties between states.

For the third objective, researchers designed two strategies that may lead to improved performance. The first was a way to modify options and then generate new, improved options. The second was a way to make use of 'randomly generated' options to expedite planning and learning.

The project was successful in developing a new framework for planning and learning in data-driven, variable environments. The research has the potential to open up opportunities for large-scale optimisation of dynamic systems that could have a significant impact on the scale of problems that can be solved.

Altri progetti dello stesso programma (FP7-PEOPLE)

ETP-EABIOFILMS (0)

Techniques for investigating Electron Transfer Processes in ElectroActive Biofilms

Read More  

TRAIN 2009 (2011)

Training through Research Application Italian INitative 2009

Read More  

MECAR (2014)

Magnetically Enhanced Controlled Axonal Regeneration

Read More