Background subtraction is an important first step for many vision problems. It separates objects from background clutter, usually by comparing motion patterns, and facilitates subsequent higher-level operations, such as tracking, object identification, etc. Because the environment can change substantially, both in the short term and throughout the lifetime of the vision system, background subtraction algorithms are expected to be robust.
One popular approach is the Stauffer and Grimson (SG) background model, which models the distribution of colors (over time) of each pixel as a mixture of Gaussians. This accounts for the fact that, in scenes with multiple objects, pixel colors change as objects traverse the scene. For example, if an object stops, it should at some point be considered part of the background. As it departs, the un-occluded area should be quickly reassigned to the background. Some objects may even exhibit cyclic motion, e.g. a flickering light display, making a number of background pixels undergo cyclic variations of color over time. The mixture model captures these state transitions very naturally, while providing a compact summary of the color distribution. This simplifies the management of the background model, namely the problems of updating the distribution over time, or deciding which components of the mixture model should be dropped as the background changes (due to variable scene lighting, atmospheric conditions, etc.).
One of its main drawbacks of SG is the assumption that the background is static over short time scales. This is a strong limitation for scenes with spatio-temporal dynamics in the background, such as water scenes. Although the model allows each pixel to switch state, and tolerates some variability within the state, the Gaussian mixture assumes that the variability derives from noise, not the structured motion patterns that characterize moving water, burning fire, swaying trees, etc. One approach that has shown promise for modeling these types of structured motion patterns is the dynamic texture, which models a spatio-temporal volume as a sample from a linear dynamical system (LDS), and have shown surprising robustness for video synthesis, segmentation, and image registration.
In summary, background subtraction requires both a state-based representation (as in SG) and the ability to capture scene dynamics within the state (as in the dynamic texture methods). This suggests a very natural extension of the two lines of work: to represent spatio-temporal video cubes as samples from a mixture of dynamic textures in the SG framework. To enable efficient on-line learning of dynamic textures, we derive the sufficient statistics and propose an online learning algorithm. This generalized SG algorithm inherits the advantages of SG: 1) it adapts to long-term variations via on-line estimation; 2) it can quickly embrace new background motions through the addition of mixture components; and 3) it easily discards outdated information by dropping mixture components with small priors.
Experimental results show that background modeling with the mixture of dynamic textures substantially outperforms both static background models, and those based on a single dynamic texture.
Selected Publications
,
Machine Vision and Applications, 22(5):751-766, Sep 2011.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 30(5):909-926, May 2008.