Crowd Datasets

CVCS: Cross-View Cross-Scene Multi-View Crowd Counting Dataset

Synthetic dataset for cross-view cross-scene multi-view counting. The dataset contains 31 scenes, each with about ~100 camera views. For each scene, we capture 100 multi-view images of crowds.

Fine-Grained Crowd Counting Dataset

Dataset for fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category.

CityStreet: Multi-view crowd counting dataset

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.

CityUHK-X: crowd dataset with extrinsic camera parameters

Crowd counting dataset of indoor/outdoor scenes with extrinsic camera parameters (camera angle and height), for use as side information.

UCSD Pedestrian Dataset

Video of people on pedestrian walkways at UCSD, and the corresponding motion segmentations. Currently two scenes are available.


People Annotations for UCSD Dataset

People annotations, perspective density maps, region-of-interest, and crowd counts for the UCSD Pedestrian Dataset.ucsdpeds-gt-sm

People Counting Data for UCSD Dataset

The features and counts for people counting on the UCSD Dataset. This data should be sufficient if you are interested in the regression problem only. Includes the Peds1, Peds2, and CVPR counting datasets.

People Counting Data for PETS2009 Dataset

The features and counts for people counting on the PETS2009 Dataset.  Also includes the segmentations, perspective maps, and ground-truth annotations.

Line Counting Dataset

These are the ground-truth annotations for line counting on the UCSD, Grand Central, and LHI datasets.


Human Pose Datasets

EgoDaily – Egocentric dataset for Hand Disambiguation

Egocentric hand detection dataset with variability on people, activities and places, to simulate daily life situations.

MADS: Martial Arts, Dancing, and Sports Dataset

A multi-view and stereo-depth dataset for 3D human pose estimation, which consists of challenging martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton).


Video Datasets

Experimental setup for semantic video texture annotation on the DynTex dataset

Videos can be obtained from the DynTex website.  The text files contain the list of selected tags, the list of selected videos and ground-truth tags, and the training/test set splits.

Boats Videos

A video of boats moving through water.  A challenging background subtraction task, where the background itself is moving.


Synthetic Video Texture Dataset

A dataset of composite video textures.  The videos were created by compositing different video textures together into a template with 2, 3, or 4 segments.


Highway Traffic Dataset (Clustering)

A dataset of highway traffic videos used for clustering video textures.


Highway Traffic Videos (Classification)

A set of highway traffic videos. Each video is classified as low, medium, or high traffic.


Other Datasets

Dolphin-14k: Chinese White Dolphin detection dataset

A dataset consisting of  Chinese White Dolphin (CWD) and distractors for detection tasks.

Small Object Dataset

Images of small objects for small instance detections.  Currently four object types are available.


Manga Layout Dataset

Dataset of manga panel layouts.


  • Files: zip
  • If you use this dataset please cite:
    Automatic Stylistic Manga Layout.
    Ying Cao, Antoni B. Chan, and Rynson W.H. Lau,
    ACM Transactions on Graphics (Proc. SIGGRAPH Asia 2012), Singapore, Nov 2012.
Key annotations for the GTZAN music genre dataset

Ground-truth annotations of the musical keys of songs in the GTZAN music genre dataset.


Modeling Eye Movements with Deep Neural Networks and Hidden Markov Models (DNN+HMM)

This is the toolbox for modeling eye movements and feature learning with deep neural networks and hidden Markov models (DNN+HMM).

Crowd counting: Zero-shot cross-domain counting

Generalized loss function for crowd counting.

Crowd counting: Generalized loss function

Generalized loss function for crowd counting.

Parametric Manifold Learning of Gaussian Mixture Models (PRIMAL-GMM) Toolbox

This is a python toolbox learning parametric manifolds of Gaussian mixture models (GMMs).

Crowd counting: Modeling noisy annotations

Modeling noisy annotations in crowd counting: NoisyCC.

Visual Object Tracking: ROAM and ROAM++

Recurrently optimized tracking with ROAM and ROAM++.

Crowd counting: Kernel-based density map generation

KDMG for crowd counting.

Eye Movement analysis with Switching HMMs (EMSHMM) Toolbox

This is a MATLAB toolbox for analyzing eye movement data using switching hidden Markov models (SHMMs), for analyzing eye movement data in cognitive tasks involving cognitive state changes. It includes code for learning SHMMs for individuals, as well as analyzing the results.

Crowd counting: Multi-view Multi-scale (MVMS) counting

The code/model for wide-area crowd counting using multiple views: multi-view multi-scale (MVMS) model.

Image Captioning: Diversity Metrics

Toolbox for computing diversity metrics for image captioning.

Crowd counting: residual regression with semantic prior

The code/model for crowd counting using residual regression and semantic prior.

DPHEM toolbox for simplifying GMMs

Toolboxes for density-preserving HEM algorithm for simplifying mixture models.

Image Captioning: Gated Hierarchical Attention

The code/model for GHA for image captioning.

Visual Object Tracking: MemTrack and MemDTC

The code/models for MemTrack and MemDTC for visual object tracking

Visual Object Tracking: Recurrent filter learning

This is a code/model for Recurrent Filter Learning for VOT.

Eye Movement Hidden Markov Models (EMHMM) Toolbox

This is a MATLAB toolbox for analyzing eye movement data using hidden Markov models. It includes code for learning HMMs for individuals, as well as clustering indivduals’ HMMs into groups.


Manga panel extraction toolbox

This is a MATLAB toolbox for automatically extracting the panels from digital manga/comic pages.

VarBB Toolbox

The VarBB toolbox is an implementation for the variational branch-and-bound algorithm for Bregman ball trees (bb-trees).  VarBB can speed up nearest neighbor search for generative models.

H3M toolbox

This is a MATLAB toolbox for clustering hidden Markov models using the variational HEM algorithm.  The toolbox can also estimate HMM mixtures (H3M) using the EM algorithm.

libdt – OpenCV library for Dynamic Textures

This is an OpenCV C++ library for Dynamic Teture (DT) models.  It contains code for the EM algorithm for learning DTs and DT mixture models, and the HEM algorithm for clustering DTs, as well as DT-based applications, such as motion segmentation and Bag-of-Systems (BoS) motion descriptors.

Generalized Gaussian Process Models Toolbox

This is a toolbox for generalized Gaussian process models (GGPM). The toolbox is implemented as an add-on to the GPML toolbox for Matlab/Octave. The toolbox contains likelihood functions for GGPMs, as well as a Taylor inference function. GPML version 3.4 is supported.