Dynamic Texture Models

ldtfig2

One family of visual processes that has relevance for various applications of computer vision is that of, what could be loosely described as, visual processes composed of ensembles of particles subject to stochastic motion. The particles can be microscopic (e.g plumes of smoke), macroscopic (e.g. leaves blowing in the wind), or even objects (e.g. a human crowd or a traffic jam). The applications range from remote monitoring for the prevention of natural disasters (e.g. forest fires), to background subtraction in challenging environments (e.g. outdoor scenes with moving trees in the background), and to surveillance (e.g. traffic monitoring, crowd analysis and management). While traditional motion representations model the movement of individual particles (e.g. optical flow), which may be contrary to how these visual processes are perceived, recent efforts have advanced toward holistic modeling, by viewing video sequences derived from these visual processes as dynamic textures (Doretto et. al, IJCV 2003) or, more precisely, samples from a generative, stochastic, texture model defined over space and time.

The goal of this project is to develop a family of motion models that extends and complements the original dynamic texture model. These new models can solve challenging computer vision problems, such as motion segmentation and motion classification, and can be applied to interesting real-world problems, such as crowd and traffic monitoring. These models can also be applied to computer audition problems (music information retrieval), such as semantic music annotation and music segmentation.

Models

Clustering Dynamic Textures

We propose a hierarchical EM algorithm capable of clustering dynamic texture models and learning novel cluster centers that are representative of the cluster members. DT clustering can be applied to semantic motion annotation and bag-of-systems codebook generation.

Layered Dynamic Textures

One disadvantage of the dynamic texture is its inability to account for multiple co-occuring textures in a single video. We extend the dynamic texture to a multi-state (layered) dynamic texture that can learn regions containing different dynamic textures.

  • Antoni B. Chan and Nuno Vasconcelos, "Layered dynamic textures." IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 31(10):1862-1879, Oct 2009.
Mixtures of Dynamic Textures

We introduce the mixture of dynamic textures, which models a collection of video as samples from a set of dynamic textures. We use the model for video clustering and motion segmentation.

Kernel Dynamic Textures

We introduce a kernelized dynamic texture, which has a non-linear observation function learned with kernel PCA. The new texture model can account for more complex patterns of motion, such as chaotic motion (e.g. boiling water and fire) and camera motion (e.g. panning and zooming), better than the original dynamic texture.

Computer Vision Applications

Bag of Systems Trees

We propose the BoSTree that enables efficient mapping of videos to the bag-of-systems (BoS) codebook using a tree-structure, which enables the practical use of larger, richer codebooks.

Pedestrian Crowd Counting

We estimate the size of moving crowds in a privacy preserving manner, i.e. without people models or tracking. The system first segments the crowd by its motion, extracts low-level features from each segment, and estimates the crowd count in each segment using a Gaussian process.

Background Subtraction in Dynamic Scenes

The background model is based on a generalization of the Stauffer-Grimson background model, where each mixture component is a dynamic texture. We derive an on-line algorithm for updating the parameters using a set of sufficient statistics of the model.

Mixtures of Dynamic Textures

We introduce the mixture of dynamic textures, which models a collection of video as samples from a set of dynamic textures. We use the model for video clustering and motion segmentation.

Classification and Retrieval of Traffic Video

We classify traffic congestion in video by representing the video as a dynamic texture, and classifying it using an SVM with a probabilistic kernel (the KL kernel). The resulting classifier is robust to noise and lighting changes.

Computer Audition Applications

Dynamic textures can also be applied to modeling music signals as a time-series.

Music Annotation with Time-Series Models

We propose an approach to automatic music annotation and retrieval that is based on the dynamic texture mixture, a generative time series model of musical content. The new annotation model better captures temporal (e.g., rhythmical) aspects as well as timbral content.

Segmenting Musical Structure

We model a time-series of audio feature vectors, extracted from a short audio fragment, as a dynamic texture. The musical structure of a song (e.g. chorus, verse, and bridge) is discovered by segmenting the song using the mixture of dynamic textures. The song segmentations are used for song retrieval, song annotation, and database visualization.

  • Luke Barrington, Antoni B. Chan, and Gert R.G. Lanckriet, "Modeling music as a dynamic texture." IEEE Trans. on Audio, Speech and Language Processing (TASLP), 18(3):602-612, Mar 2010.

Selected Publications

Links

Here are links to more resources on Dynamic Textures: