Résolution de problèmes inverses : optimisation et parallélisation

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

21 personnes membres du GdR ISIS, et 0 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 60 personnes.

Annonce

Titre : Résolution de problèmes inverses : optimisation et parallélisation

Date : 29 juin 2012

Lieu : ESPCI - Amphi JOLIOT (bât. N - 2^ème étage)
http://www.espci.fr/fr/contact/plan-d-acces

Thème : Problèmes inverses, Optimisation, Algorithmes, parallélisation, architectures parallèles, GPU/CPU

Résumé
L'objectif de cette réunion est de faire le point sur l'exploitation des puissances de calculs disponibles actuellement dans le cadre de la résolution de problèmes inverses et/ou d'optimisation.

L'essentiel des gains de puissance de calcul obtenus aujourd'hui, le sont principalement par l'exploitation de systèmes parallèles voir fortement parallèles.

Dans un premier temps, l'objectif de cette journée sera de faire le point sur les techniques GPU (Graphics Processing Unit). Leur utilisation, leur mise en œuvre, la faisabilité et les difficultés de telles techniques, ainsi que leur comparaison aux systèmes CPU multi-cœurs, multi-CPU seront abordés.

Ces aspects seront préférentiellement traités dans le cadre des méthodes inverses en signal-image. Un objectif sera en particulier de discuter de la mise en œuvre efficace
- d'algorithmes d'optimisation parallèles tels que les méthodes de décomposition,
- du principe de l'échantillonnage comprimé.

De façon plus générale, la problématique de l'adéquation algorithmes/architectures sera un sujet central de cette réunion.

Organisateurs

Nicolas Bertaux : nicolas.bertaux@fresnel.fr
Caroline Chaux : Caroline.Chaux@univ-mlv.fr
Jalal Fadili : Jalal.Fadili@greyc.ensicaen.fr
Jérôme Idier : Jerome.Idier@irccyn.ec-nantes.fr

Programme

10h - 11h : Nicolas Gac - Parallélisation des calculs sur serveur multi-GPUs pour la résolution de problèmes inverses

11h - 11h30 : pause

11h30 - 12h : G. Perrot, S. Domas, R. Couturier, N. Bertaux - Transposition GPU d'un algorithme de traitement d'images type 'snake'

12h - 12h30 : Y. Le Sant, B. Leclaire, G. Le Besnerais, F. Champagnat, A. Plyer - Estimation rapide de champs de vitesse sur GPU

12h30 - 14h : Pause déjeuner

14h - 15h : Luca Zanni - Gradient projection methods for large-scale optimization in inverse problems: acceleration techniques and parallel implementation

15h - 15h30 : R. Gaetano, G. Chierchia and B. Pesquet-Popescu - A Parallel Proximal Algorithm for Dense Disparity Estimation and Implementations for Parallel Devices

15h30 - 16h : Pause

16h - 16h30 : R. Cavicchioli, C. Chaux C, L. Blanc-Feraud L, L. Zanni - Accelerating gradient methods for hyperparameter estimation in image reconstruction

16h30 - 17h : M. Dumitru, L. Gharsalli, A. Mohammad Djafari – Bayesian Variational Approximation Implementation for linear inverse problem with Infinite Gaussian mixture model

Résumés des contributions

Parallélisation des calculs sur serveur multi-GPUs pour la résolution de problèmes inverses

Nicolas Gac

Résumé : Les algorithmes itératifs utilisés lors de la résolution de problèmes inverses portant sur des gros volume de données requiert une accélération significative pour être utilisés en pratique. Sur des exemples d'applications en tomographie X (reconstruction de volume 1024**3 voxels) et en déconvolution de signaux 1D (enregistrement sur plusieurs années de données spectrales de Mars) ou 2D (flux vidéo d'une webcam), nous présenterons notre recherche de solutions permettant la parallélisation des calculs la plus efficace possible sur les processeurs de type "many cores" que sont les GPUs. Nous exposerons ainsi la triple adéquation entre l'architecture des processeurs GPUs (CUDA de Nvidia), la (re)formulation des algorithmes et la (re)structuration des données que nous avons mis en oeuvre sur différentes types d'opérateurs utilisés dans les algorithmes itératifs (projecteur, rétroprojecteur, convolution nD). Par ailleurs, nous montrerons l'attention particulière qui doit être apportée au goulot d'étranglement liés au temps de transfert entre le PC et la carte GPU. Enfin, nous présenterons le découpage des données que nous avons effectué afin de bénéficier pleinement d'un serveur multi-GPUs et apporterons quelques éléments de réponse sur l'utilisation des GPUs couplés à Matlab et des librairies déjà existantes (CUBLAS, NVPP...).

Transposition GPU d'un algorithme de traitement d'images type 'snake'.

Gilles Perrot (1), Stéphane Domas (1), Raphaël Couturier (1), Nicolas Bertaux (2)

(1) Institut FEMTO-ST, Rue Engel Gros, 90000 Belfort, France.

(2) Institut Fresnel, CNRS, Aix-Marseille Université, Ecole Centrale Marseille, Campus de Saint-Jérôme, 13013 Marseille, France.

Résumé : En traitement d'images, une limite à l'utilisation d'un grand nombre d'algorithmes avancés demeure le temps de calcul de leurs implémentations. Parallèlement, les gains obtenus aujourd'hui en puissance de calcul le sont essentiellement grâce l'emploi d'architectures parallèles de diverses natures. Parmi ces solutions, les GPGPUs (processeurs graphiques à usage général) sont une réponse économique et performante pour l'implémentation de certaines classes d'algorithmes. Toutefois, leur architecture très spécifique ainsi que l'interaction avec le processeur hôte ne permettent pas de garantir, dans tous les cas, une transposition simple et efficace. A tel point qu'il n'est pas rare de voir des implémentions GPU moins performantes que l'implémentation CPU de référence. Nous proposons d'illustrer ces aspects en s'appuyant sur l'exemple d'un algorithme de segmentation par contour actif, orienté régions (snake). Après avoir rappelé brièvement le principe du snake étudié, nous détaillerons la méthode employée pour paralléliser cet algorithme sur GPU, en nous focalisant sur certains aspects génériques et symptomatiques. Nous évoquerons en particulier, en donnant autant que possible des extraits de code :

- la représentation des données. Elle est une des clés de la transposition des algorithmes et est étroitement liée au niveau de parallélisme appliqué.

- la génération d'images intégrales. Elle représente un type de prétraitement très employé.

- les étapes de réduction. Par exemple les sommes, tris ou recherches d'extrema. Elles sont un point délicat car les GPUs ne sont clairement pas conçus pour que ces traitements y soient exécutés de manière optimale. Un certain nombre de travaux apportent des réponses que l'on présentera.

- l'optimisation de la grille de calcul. Contrairement à ce que préconise le principal fabricant de GPU et ainsi que l'a montré Volkov, la maximisation de l'occupation (occupancy) ne permet pas toujours de tirer le meilleur parti de la puissance de calcul du GPU. De ce point de vue, les architectures récentes peuvent représenter un recul.

Chacun des points abordés sera également l'occasion de présenter les spécificités et les contraintes des accès aux différentes zones mémoire des GPU (globale, partagée, textures, registres).

FOLKI-SPIV : estimation rapide de champs de vitesse sur GPU

Yves Le Sant, Benjamin Leclaire (ONERA/DAFE)

Guy Le Besnerais, Frédéric Champagnat, Aurélien Plyer (ONERA/DTIM)

Résumé : Le contexte de ce travail est la vélocimétrie par imagerie de particule (PIV) une technique de mesure de champs de vitesse aujourd'hui couramment employée en mécanique des fluides expérimentale. Nous rappelons le principe de cette technique en précisant le problème de traitement de données associé. Il s'agit d'estimer des champs de déplacement entre deux images (ou entre deux couples d'images stéréoscopiques pour accéder à la composante hors-plan) de manière robuste et rapide : typiquement on cherche à réaliser cette estimation sur plusieurs milliers de couples d'images de taille 4MP. Nous présentons une stratégie d'optimisation issue des travaux sur le flot optique menés en la vision par ordinateur, stratégie qui se démarque notablement des approches classiques de la PIV. Nous montrons que cette stratégie est très favorable à une implémentation sur GPU. Nous détaillons certains aspects de cette implémentation et nous décrivons ainsi FOLKI-SPIV, un logiciel développé à l'ONERA offrant un rapport précision sur vitesse de calcul inédit en PIV.

F. Champagnat, A. Plyer, G. Le Besnerais, B. Leclaire, S. Davoust et Y. Le Sant, “Fast and accurate PIV computation using highly parallel iterative correlation maximization,” Experiments in Fluids, Volume 50, Number 4, 1169-1182, 2011

Gradient projection methods for large-scale optimization in inverse problems: acceleration techniques and parallel implementation

Luca Zanni

Résumé : The mathematical models for inverse problems often lead to large scale constrained optimization problems with differentiable objective functional and simple constraints. Gradient projection methods are appealing approaches for these optimization problems for two main reasons: first, the simple structure of the constraints makes the projection of a vector on the feasible region a non-excessively expensive operation; second, the recent advances on the step length selection in gradient methods allow us to exploit special approximations of second order information and largely improve the convergence rate of these schemes, without significant additional costs. Thus, new gradient projection methods can be designed that represent a valid alternative to popular gradient-based approaches widely used for inverse problems. Furthermore, these new methods preserve the well-known simplicity of the gradient-based schemes and then facilitate effective implementation for multiprocessor architectures. In this talk, starting from the basic ideas on the step length selection that motivated the recent accelerated versions, we discuss important features of the gradient projection methods, such as the choice of suited scaled gradient directions and the algorithms to perform non-expensive projection on the feasible region. Applications of gradient projection methods to inverse problems in image and signal reconstruction will be considered and numerical results on GPU and multi-core CPU systems will be presented.

A Parallel Proximal Algorithm for Dense Disparity Estimation and Implementations for Parallel Devices

R.Gaetano, G. Chierchia and B. Pesquet-Popescu - TELECOM-ParisTech, Dept. TSI, 37-39 rue Dareau, 75014 Paris

Résumé : The recovery of depth information via disparity estimation from stereo or multi-view images represents a fundamental step for many recently growing application, ranging from 3D imaging, e.g., for the efficient encoding and compression of 3D videos or the rendering for 3DTV and FTV (Free-viewpoint Television), to the enhancement of machine vision systems. Disparity estimation is in general a considerably complex task. The use of local techniques, based on matching pixels by observing similarities in a close neighborhood, even though partly keeping the complexity limited and allowing for data-parallelization, provide poor performances above all for high-end visual application such as 3D imaging. On the other side, the progress made in recent years with the development of global techniques, based on combinatorial optimization or variational approaches, has always been flanked by a considerable increase of the computational complexity, limiting their usability in real-world applications. Although state-of-the-art techniques sometimes allow for trade-offs between the accuracy of the results and the computational burden, the definition of a method that provides cutting-edge quality and, at the same time, allows for performance optimization via parallel implementation, is definitely a current challenge. This calls for the formulation of the problem of disparity estimation in a mathematical framework that enables the design of high performance (i.e., data- and task- parallel) algorithms.

To answer this, a novel methodological framework and a related disparity estimation technique have been recently introduced in [1, 2], whose main qualifying point is related to the possible definition of a global convex optimization criterion as a combination of multiple criteria, whose activation can be performed in parallel at each step of the optimization process (proximal splitting). Therefore, the problem can be formulated in a more flexible way, aggregating heterogeneous prior information without necessarily incurring into complex, yet intractable, optimization criteria. Moreover, such formulation naturally enables the development of algorithms suitable for implementation on parallel computing devices.

With reference to the disparity estimation technique introduced in [2], our current work is to point out the main issues related to its efficient task-parallelization have been addressed. A first aim is to highlight the methodological issues that presently limit the computational potential of the technique, and propose several suitable modifications. Mainly, we focus on the computation of the proximity operators related to each criterion involved: such operators do not always admit a closed form, making it necessary to nest other iterative algorithms in the optimization process. With specific attention to the reference disparity estimation problem, a solution has been proposed to avoid this procedure nesting and to balance the computational complexity of the different parallel tasks.

Parallel implementations of the resulting algorithm have been first developed for multicore CPUs, more structurally suited for task-parallel processing. Furthermore, we also provided implementations for GPU devices, expected to achieve considerable speed-ups due to the significant presence of matrix operations. The experimental results definitely confirm the performance potential of technique.

[1] J.-C. Pesquet, “A Parallel Inertial Proximal Optimization Method,” Tech. Rep., Université

Paris-Est (Laboratoire d’Informatique Gaspard Monge), 2010.

[2] M. El Gheche, C. Chaux, J.-C. Pesquet, J. Farah, and B. Pesquet-Popescu, “Disparity map estimation under convex constraints using proximal algorithms,” in Proc. of IEEE Workshop on Signal Processing Systems, SIPS 2011, Beirut, Lebanon, 2011.

Accelerating gradient methods for hyperparameter estimation in image reconstruction.

Cavicchioli R. (1), Chaux C. (2), Blanc-Feraud L. (3), Zanni L. (1)

(1) Department of Mathematics, University of Modena and Reggio Emilia, Italy.

(2) Université Paris-Est, LIGM and UMR CNRS 8049, Marne-La-Vallée Cedex 2, France

(3) MORPHEME I3S/INRIA/IBV, Sophia Antipolis, France.

We consider the problem of image reconstruction in which a linear operator with additive Gaussian noise models the acquisition device. A variational approach is adopted, consisting in minimizing a convex criterion composed of two parts: a data fidelity term representing a mean square error and a regularization term expressed in the wavelet domain. Following [1], the regularization hyperparameters (one per wavelet subband) are computed through a Maximum Likelihood (ML) estimator, in which the a posteriori probability is numerically approximated by sampling the distribution with MCMC (Markov chain Monte Carlo) methods, because a computable analytical form is not available. The optimization problem provided by the ML approach is faced by a gradient method, whose performance is crucial for

the effectiveness of the overall hyperparameter estimation method. In this talk, after introducing the problem, we will discuss a steplength selection strategy for accelerating the convergence rate of the optimization method and a gradient based line-search to advance toward a stationary point and to increase the stability of the algorithm. The steplength selection algorithm is based on the one proposed in [3], which has shown remarkable convergence rate improvements in many test problems. Since we cannot compute analytically the objective function, a line-search technique is derived from the one proposed in [2] that exploits only the value of the gradient. Preliminary simulations suggest the promising performance of the proposed approach.

[1] Chaux C., Blanc-Feraud L. (2012). Wavelet-based hyperparameter estimation for solving inverse problems, available at http://hal.inria.fr/hal-00668807

[2] Cheng W., Chen Z. (2012). Nonmonotone spectral method for large-scale symmetric nonlinear equations. Numerical Algorithms. doi:10.1007/s11075-012-9572-z

[3] Frassoldati G., Zanghirati G., Zanni L. (2008). New Adaptive Stepsize Selections in Gradient Methods. J. Indutrial and Management Optimization, 4(2), 299-312.

Bayesian Variational Approximation Implementation for linear inverse problem with infinite Gaussian mixture model

Mircea Dumitru, Leila Gharsalli, Ali Mohammad Djafari - Laboratoire des signaux et systems (L2S), UMR 8506 CNRS-SUPELEC-Univ. Paris Sud 11, Supelec, Plateau de Moulon 91192 Gif-sur-Yvette, France

Résumé : We consider a Bayesian approach to linear inverse problems where an In_nite Gaussian Mixture Model (IGMM) is used as the a priori to enforce the sparsity of the solution. This IGMM is used in a hierarchical way via the hidden variables representing the inverse variances which we want to estimate jointly with with the unknowns. For this the joint posterior probabilistic law of the unknowns and these hidden variables is approximated by a fully separable probabilistic law via the Variational Bayesian Approximation (VBA) approach where the criterion used is the Kullback-Leibler divergence. To obtain the solution this criterion can be optimized either by a classical alternate optimization or by a gradient based algorithm [1] [2].

In this paper we focus on the implementation issues of these algorithms for great dimensional linear inverse problems where we do not have access directly to the huge dimensional matrix representing the forward operator. In many real applications of inverse problems we have access to the results of the forward and the adjoint operators without needing to have explicitly the matrix corresponding to the operators. We consider then particularly these algorithms in this context. At the end we show the results of two applications: estimating the components of a periodic signal and the image reconstruction in X-ray computed tomography.

Key Words: Inverse Problems, Sparse priors, VBA, IGMM, Period Estimation, Biological time series.

[1] A. Mohammad-Djafari, Bayesian approach with prior models which enforce sparsity in signal and image processing, EURASIP Journal on Advances in Signal Processing, vol. Special issue on Sparse Signal Processing, pp. 12-52, 2012.

[2] A. Fraysse and T. Rodet, A gradient-like variational Bayesian algorithm, in SSP 2011, no. S17.5, (Nice, France), pp. 605-608, June 2011.

Identification