Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - CLIM (Computational Light fields IMaging)

Teaser

Light fields technology holds great promises in computational imaging. Light fields cameras capture light rays as they interact with physical objects in the scene. The recorded flow of rays (the light field) yields a rich description of the scene enabling advanced image...

Summary

Light fields technology holds great promises in computational imaging. Light fields cameras capture light rays as they interact with physical objects in the scene. The recorded flow of rays (the light field) yields a rich description of the scene enabling advanced image creation capabilities from a single capture, and 3D scene geometry estimation and 3D scene reconstruction. However, the trajectory to a deployment of light fields remains cumbersome. Barriers are limitations of capturing devices in terms of spatial or angular resolution, in terms of noise. Another critical barrier is the huge amount of high-dimensional (4D/5D) data produced by light fields with obvious implications on storage but also on processing in interactive time. The development of efficient methods for scene analysis, e.g. depth estimation or scene flow estimation, from light fields, for light field editing as classical today with 2D imaging is another key challenge for a wide adoption of this technology.

The proposed project aims at addressing the above barriers and challenges with a research plan leveraging recent advances in three fields: signal and image processing, computer vision and machine learning. While addressing the above challenges for 2D images has been essential to the success of digital imaging in the past decades, these issues are even more compelling in the case of light field imaging modalities. Their specific data structures, their dimension, require a major leap forward, and way beyond a straightforward application of 2D image processing methods. In parallel of this evolution towards novel camera designs for richer imaging modalities, machine learning is revolutionizing the field of digital image processing, be it for compression, compressive sensing, or for solving a variety of inverse problems (denoising, deblurring, super-resolution, inpainting, etc …).

The ambition of CLIM is to lay new algorithmic foundations for the 4D/5D light fields entire processing chain. Indeed, data processing becomes tougher as dimensionality increases, which is the case of light fields compared to 2D images, hence the need for mathematical tools for dimensionality reduction or low dimensional embedding of light field data. These mathematical models will be instrumental, together with scene analysis, i.e. scene depth and scene flow estimation, in the development of a coding/decoding architecture for light fields. Another challenge is to solve a number of inverse problems with light fields. Some of the targeted inverse problems, denoising, spatial resolution, angular super-resolution via view synthesis, aim at coping with technological limitations of capturing devices. The other inverse problems we address aim at enabling scene analysis (e.g. depth and scene flow estimation) from light fields, as well as light field editing.

Applications enabled by the project results include consumer applications (e.g. photography, augmented reality, autonomous vehicles), applications in surveillance (face recognition, gesture analysis), and in life science (light field microscopy, medical imaging, particle image velocimetry). To give an example in life science, light field microscopy is a new technique for volumetric imaging of weakly scattering or fluorescent specimens that thanks to an array of microlenses, allows capturing a 4-D light field using a single photographic exposure without the need for scanning.

Work performed

The availability of datasets is always critical for research in image processing, and even more when considering deep learning methods. It was therefore logical to dedicate effort to produce static and video light fields datasets for both natural and synthetic scenes. For the synthetic light fields, we have also produced ground truth depth maps and optical flows, important for supervised learning and for objective evaluations of the developed scene analysis methods. These datasets are publicly available via the project web site (http://clim.inria.fr/DataSoftware.html).

The design of high quality light field cameras is a challenging issue that we have tackled from different angles. A first approach has consisted in enhancing the quality of views extracted from data captured by plenoptic cameras with novel white image guided methods for color demosaicing and for aligning the micro-lens array and the sensor. A second approach pertain to the area of coded aperture imaging placed in a compressed sensing framework. The ultimate goal is the design of novel coded aperture camera architecture.

We have developed several mathematical tools for low-dimensional and sparse representations of light fields. The first tool is a homography-based low rank (HLRA) approximation method that jointly computes homographies to align the light field views and the components of a low rank model. We have explored graph-based representations and designed super-ray based non separable and separable local graph-based transforms. We have further introduced novel graph-based sampling and prediction schemes for light fields. The above mathematical tools have allowed us to obtain high compression performance and the low rank models have been instrumental for addressing some inverse problems in light field imaging with deep learning techniques.
A complete compression algorithm involves many complementary processing steps aiming at de-correlating the signal in spatial, angular and temporal dimensions, via synthesis or prediction, low dimensional embedding and statistical coding. We are studying different schemes using view synthesis, low rank models and/or graph transforms. A critical component of view synthesis is depth estimation that remains a difficult problem in the case of light fields. We have developed methods based on optical flows and low rank models or on deep learning techniques yielding depth maps with accuracy beyond the state of the art. The problem of deep learning techniques is however the huge amount of parameters (often a few hundreds of millions) with obvious implications on memory footprints. We have worked on the design of a very lightweight convolutional network architecture for depth estimation and view prediction with a number of parameters divided by 10 compared to methods like Deep3D. This work is now extended towards scene flow (i.e. 3D motion) estimation introducing a novel parametric model in the 4D ray space valid for both sparsely and densely sampled light fields.

We have also addressed several computational imaging problems. The first problem is light field editing that we have tackled considering different approaches: via a novel low rank matrix completion method, via a novel approach based on structure tensor driven diffusion on epipolar plane images. We have also addressed the spatial and angular resolution trade-off of light field cameras by developing super-resolution methods based on learning techniques: with either multivariate ridge regression or deep neural networks in spaces of reduced dimensions defined by low rank models. Finally, we have developed a regularization framework in the 4D ray space using anisotropic partial differential based equations that has been applied to various light field processing problems: denoising, interpolation, inpainting, depth map.

Final results

The project will pursue its four main light field processing related research tracks: capture and creation, lower-dimensional representations and sparse models, coding architecture and compression algorithms, and algorithms for computational imaging.

While our achievements of the first period on light fields capture with a color coded mask in a compressive sensing framework have shown promising results compared to related state of the art techniques, they correspond to a first step only. We are currently aiming at a generic model for light field cameras using coded masks, that together with efficient deep learning based recovery methods, should lead to novel coded aperture light field camera designs.

In the first period, we have explored low rank and graph-based models mostly for static light field. This research track will now move towards the development of such models for dynamic or video light fields, implying for example the development of time-varying graphs where edges change over time, to best de-correlate the signal along super-rays and motion trajectories. This problem is widely open and we expect such time varying graphs along 3D motion trajectories in light fields to be instrumental in dynamic light field compression but not only as they can define the support for regularization needed in a variety of processing tasks. Establishing the link with point cloud representations is also part of our objectives. The goal is also to extend the coding architecture and compression algorithm from static to video light fields.

Motion trajectories hence need to be estimated, e.g. using parametric models for scene flows in the 4D ray space as already developed in the first period. Estimating scene flows, i.e., dense or semi-dense 3D motion fields of a scene, especially for sparsely sampled light fields, remains a widely open topic, yet a key ingredient of compression, view synthesis and prediction. We can formulate the problem of scene flow estimation differently according to the final goal (3D modeling, view synthesis or view interpolation). We can pose the problem as a supervised or unsupervised learning task that can be successfully solved with convolutional networks, which makes natural the further investigation of deep learning techniques as already initiated in the project. Note that the very large number of parameters of existing architectures make those solutions not very practical for resource-limited platforms. So, our goal will also be to study neural networks with a low number of parameters to make those solutions practical for mobile devices, which is far from being the case of existing deep learning architectures that often comprise some hundreds millions of parameters.

In the first period, we have developed computational imaging algorithms for solving a variety of problems in light field imaging: denoising, spatial and angular super-resolution, inpainting and edit propagation. Solving inverse problems requires a good understanding of the structure of the latent data space. The latent space allows us to keep only the relevant information as regularizers. So far, low rank models for light fields have been central in these algorithms, as regularization priors. In parallel, machine and in particular deep learning is revolutionizing the field of digital imaging. Machine learning techniques allow us to learn the latent space in which the images reside, from a set of examples. However, while deep learning methods have achieved state-of-the-art performance in many challenging inverse problems in 2D imaging, they rely on very complex neural network architectures posing practical issues on resource constraint devices such as mobile cameras, and even more for high dimensional light field data. Aside from the necessity to adapt deep learning approaches for high dimensional data, a goal for the second period, will be to explore how deep learning and analytical and optimization methods can be combined to provide better so

Website & more info

More info: http://clim.inria.fr.