The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which depends on the number of codewords in the codebook. However, for even modest sized codebooks, mapping videos onto the codebook results in a heavy computational load. In this project we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that enables efficient mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree allows for fast look-ups in the codebook and enables the practical use of larger, richer codebooks. We demonstrate the effectiveness of BoS Trees on classification of different video datasets, as well as on annotation of a video dataset and a music dataset. We also show that, although the fast look-ups of BoS Tree result in different descriptors than BoS for the same video, the overall distance (and kernel) matrices are highly correlated resulting in similar classification performance.
- A Scalable and Accurate Descriptor for Dynamic Textures using Bag of System Trees.
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 37(4):697-712, Apr 2015. [appendix]
- Growing a Bag of Systems Tree for Fast and Accurate Classification.
In: IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Providence, Jun 2012.
- Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video.
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 35(7):1606-1621, Jul 2013. [appendix]
- Modeling, clustering, and segmenting video with mixtures of dynamic textures.
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 30(5):909-926, May 2008.