While todays robots are able to perform sophisticated tasks, they can only act on objects they have been trainedto recognize. This is a severe limitation: any robot will inevitably face novel situations in unconstrainedsettings, and thus will always have knowledge gaps. This...
While todays robots are able to perform sophisticated tasks, they can only act on objects they have been trained
to recognize. This is a severe limitation: any robot will inevitably face novel situations in unconstrained
settings, and thus will always have knowledge gaps. This calls for robots able to learn continuously about
objects by themselves. The learning paradigm of state-of-the-art robots is the sensorimotor toil, i.e. the
process of acquiring knowledge by generalization over observed stimuli. This is in line with cognitive theories
that claim that cognition is embodied and situated, so that all knowledge acquired by a robot is specific to its
sensorimotor capabilities and to the situation in which it has been acquired. Still, humans are also capable
of learning from externalized sources like books, illustrations, etc containing knowledge that is necessarily
unembodied and unsituated. To overcome this gap, RoboExNovo proposes a paradigm shift. I will develop
a new generation of robots able to acquire perceptual and semantic knowledge about object from externalized,
unembodied resources, to be used in situated settings. As the largest existing body of externalized knowledge, I
will consider the Web as the source from which to learn from. To achieve this, I propose to build a translation
framework between the representations used by robots in their situated experience and those used on the
Web, establishing links between related percepts and between percepts and
the semantics they support. By enabling robots to use knowledge resources on
the Web that were not explicitly designed to be accessed for this purpose, RoboExNovo will pave the way for
ground-breaking technological advances in home and service robotics, driver assistant systems, and in general
any Web-connected situated device.
The project is articulated in five main Work Packages (WPs), with the first four being focused on the conceptual and methodological advances necessary to reach this goal, and the last one focused on measuring progress and testing the findings of the first four WPs on as many robot platforms as possible. During the first 30 months of the project, the RoboExNovo team developed and implemented the methodological tools necessary to mine the web for perceptual knowledge without the need for human annotators, to translate this perceptual knowledge in a form usable and useful for robots and to ground it into the sensors of a given robotic platforms. To achieve this, we casted the problems within the deep learning framework, advancing the current state of the art in robot vision, rgb-d object categorization, domain adaptation and web vision.
All the methodologies developed so far are novel and represent substantial progress beyond the state of the art. The key unconventional ideas permeating all the results achieved so far can be summarized as follows:
1) The use of synthetic data from training deep networks, whether mined from the Web whether generated through adversarial networks (WP3);
2) The generation of perceptual knowledge bases on demand, without the need of human annotation, resulting in the automatic generation of perceptual databases from the Web (WP2);
3) The use of batch normalization as a potent tool for domain adaptation, generalization and discovery (WP1);
4) The casting of non-parametric families of algorithms within the deep learning framework to tackle the learning to learn problem (WP4).
All of these methodological approaches have been illustrated in more details in the previous section, as well as in the related publications. All of them have been published (or are currently under review) in the top conferences in the field of computer vision and robotics, further proving their innovative value.
By the end of the project, it is expected that the work will expand to include semantic knowledge, to be extensively tested on mobile platforms, and to encompass the open ended nature of learning about objects.