Our main research interests include:

  • computer visionsurveillance:
    dynamic textures, motion segmentation, motion analysis, semantic image annotation, image retrieval, crowd counting.
  • machine learningpattern recognition:
    probabilistic graphical models, support vector machines, Bayesian regression, Gaussian processes.
  • computer auditionmusic information retrieval:
    semantic music annotation and retrieval, music segmentation.
  • eye gaze analysis:
    modeling eye movements with hidden Markov models (HMMs), clustering HMMs

In particular, we aim to develop generative probabilistic models of images, video, and sound that can be applied to computer vision and computer audition problems, such as traffic surveillance, crowd monitoring, semantic image annotation, and music segmentation. Our current research projects are listed below.

Dynamic Texture Models

A family of generative stochastic dynamic texture models for analyzing motion in video, and time-series in general (project overview).

Bag of Systems Trees

We propose the BoSTree that enables efficient mapping of videos to the bag-of-systems (BoS) codebook using a tree-structure, which enables the practical use of larger, richer codebooks.

Clustering Dynamic Textures

We propose a hierarchical EM algorithm capable of clustering dynamic texture models and learning novel cluster centers that are representative of the cluster members. DT clustering can be applied to semantic motion annotation and bag-of-systems codebook generation.

Background Subtraction in Dynamic Scenes

The background model is based on a generalization of the Stauffer-Grimson background model, where each mixture component is a dynamic texture. We derive an on-line algorithm for updating the parameters using a set of sufficient statistics of the model.

Layered Dynamic Textures

One disadvantage of the dynamic texture is its inability to account for multiple co-occuring textures in a single video. We extend the dynamic texture to a multi-state (layered) dynamic texture that can learn regions containing different dynamic textures.

  • Antoni B. Chan and Nuno Vasconcelos, "Layered dynamic textures." IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 31(10):1862-1879, Oct 2009.
Kernel Dynamic Textures

We introduce a kernelized dynamic texture, which has a non-linear observation function learned with kernel PCA. The new texture model can account for more complex patterns of motion, such as chaotic motion (e.g. boiling water and fire) and camera motion (e.g. panning and zooming), better than the original dynamic texture.

Mixtures of Dynamic Textures

We introduce the mixture of dynamic textures, which models a collection of video as samples from a set of dynamic textures. We use the model for video clustering and motion segmentation.


Understanding Video of Crowded Environments

Motion segmentation and motion classification in video of crowded environments, such as pedestrian scenes and highway traffic.

Small Instance Detection using Object Density Maps

We propose a novel object detection framework using object density maps for partially-occluded small instances, such as pedestrians in low resolution surveillance video.

Counting Pedestrians Crossing a Line

We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence.

Pedestrian Crowd Counting

We estimate the size of moving crowds in a privacy preserving manner, i.e. without people models or tracking. The system first segments the crowd by its motion, extracts low-level features from each segment, and estimates the crowd count in each segment using a Gaussian process.

Classification and Retrieval of Traffic Video

We classify traffic congestion in video by representing the video as a dynamic texture, and classifying it using an SVM with a probabilistic kernel (the KL kernel). The resulting classifier is robust to noise and lighting changes.


Human Pose Recognition and Tracking

Recognizing and tracking 2D and 3D human pose in images and videos.

Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation

We propose a maximum-margin structured learning framework with deep neural network that learns the image-pose score function for human pose estimation.

Pose Estimation with Deep Convolutional Neural Network

We propose a heterogeneous multi-task learning framework for 2D human pose estimation from monocular images using a deep convolutional neural network that combines pose regression and part detection. We also extend the model to 3D human pose estimation.

A Robust Likelihood Function for 3D Human Pose Tracking

We propose a robust likelihood function for 3D human pose tracking, which is robust to small pose changes and better able to localize partially occluded and overlapping parts.


Data-Driven Computer Graphics

Directing User Attention via Visual Flow on Web Designs

​We present an approach that allows web designers to easily direct user attention via visual flow on web designs.

DynamicManga: Animating Still Manga via Camera Movement

We propose a method for animating still manga imagery through camera movements, driven by motion and emotion semantics automatically extracted from the manga.

Attention-Directing Composition of Manga Elements

We propose an approach for novices to synthesize a composition of panel elements that can effectively guide the reader’s attention to convey the story.

composition teaser
Automatic Stylistic Manga Layout

We propose an approach to automatically produce a manga layout from a set of input artworks, which is based on a generative layout model and parametric style models.


Eye-Gaze Analysis

Eye Movement analysis with HMMs (EMHMM)

We use hidden Markov models (HMMs) to analyze eye movement data. A person’s eye fixation sequence is summarized with an HMM, and common strategies among people are discovered by clustering HMMs.


Music Analysis

Music Annotation with Time-Series Models

We propose an approach to automatic music annotation and retrieval that is based on the dynamic texture mixture, a generative time series model of musical content. The new annotation model better captures temporal (e.g., rhythmical) aspects as well as timbral content.

Segmenting Musical Structure

We model a time-series of audio feature vectors, extracted from a short audio fragment, as a dynamic texture. The musical structure of a song (e.g. chorus, verse, and bridge) is discovered by segmenting the song using the mixture of dynamic textures. The song segmentations are used for song retrieval, song annotation, and database visualization.

  • Luke Barrington, Antoni B. Chan, and Gert R.G. Lanckriet, "Modeling music as a dynamic texture." IEEE Trans. on Audio, Speech and Language Processing (TASLP), 18(3):602-612, Mar 2010.

Image Analysis

Semantic Image Annotation

We annotate images using supervised multi-class labeling (SML), which treats semantic annotation as a multi-class classification problem. The system is scalable, and was applied to image databases with 60,000 images.