next up previous
Next: Currrent Status Up: Summary Previous: Introduction

Overview

One of my main labors is the recognition of objects, by comparison of target models and extracted data, using statistically sound methods that directly account for and manipulate uncertainty in the various stages of the system. I have developed and tested recognition methods that explicitly modeled the various sources of uncertainty. This method utilized the Expectation/Maximization algorithm to trade off solving for the correspondence between model and data features, and solving for the pose of the target in the data coordinate frame. A multiresolution version of the method was designed and implemented to demonstrate the potential efficiencies and robustness gained by using multiresolution data and models.

Figure 1:

`Multiresolution morphodynamic tracking of the mature heart'' with MRI: topologies and orientations of large-scale intra-cavity flows, deformation and motion tracking (dotted overlaid color lines) that indicates asymmetries and curvatures of the looped heart.

A drawback to this approach is a reliance on explicit uncertainty models. As an alternative, with co-workers, am investigating a new multiresolution approach for finding the pose of an object model in an image, based on a new formulation of the mutual information between model and image. As applied here, the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-based methods have difficulty, yet it is more robust than traditional correlation. The general problem of alignment involves comparing a predicted image of an object with an actual image. Given an object model and a pose, a model for the imaging process can be used to predict the image that will result. In general this is a difficult problem. If we had a good imaging model then deciding whether an image contained a particular model at a given pose is straightforward: compute the predicted image and compare it to the actual image directly. Given a perfect imaging model the two images will be identical, or close to it. Of course finding the correct alignment is still a remaining challenge.

In general the relationship between an object model (no matter how accurate) and the object's image is a complex one. However, in the part of the scene containing an image of the object, we can formulate a general imaging equation: $ v(T(x)) = F(u(x), P)$ where $ x$ are coordinates of a surface patch of the object model, $ u(x)$ describes the properties of the surface of the model at position $ x$, and $ P$ are parameters of the imaging process, such as the illumination conditions. $ F$ is the image formation function that generates the brightness of the surface patch in the image. Thus, $ v(T(x))$ is the brightness image of the object placed in the scene by coordinate transformation $ T(x)$. If $ F$ and $ P$ were known in detail it would be feasible to make an accurate prediction of scene intensities, since the physics of image formation are well understood. But, because of the complexity of visible light imaging, it may be difficult to determine the particular $ F$ and $ P$ for a given scene. Similar complexities arise in the cases of laser radar and SAR imagery.

One reason that it is, in principle, possible to find $ F$ is that the model does supply much information about the scene. Clearly if there were no mutual information between $ u$ and $ v$, there could be no $ F$. We finesse the problem of finding and computing $ F$ by dealing with this mutual information directly. Such a technique finds the alignment of the model in the scene by maximizing the information that the model gives us about the scene. (Motivated by the work of Viola and Wells.)

We have adopted the following approach to aligning the object model to the image: (1) The mutual information of the model and image is defined, and it expressed in terms of the entropies of several random variables. (2) The entropies and their derivatives are approximated by a method that involves random sampling from the model and image data. (3) A local maximum of the mutual information is sought by using a stochastic analog of gradient descent. Steps are repeatedly taken that are proportional to the approximation of the derivative of the mutual information with respect to the transformation.


next up previous
Next: Currrent Status Up: Summary Previous: Introduction
Tuan Cao-Huu
2002-07-27