About

Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [Nov 28, 2019]

    Congratulations to Weihong for defending his thesis!

  • [Nov 28, 2019]

    Congratulations to Tianyu for defending his thesis!

  • [Aug 23, 2019]

    Xueying Zhan receives the “Outstanding Academic Performance Award”, and Xueying Zhan and Jia Wan receive the “Research Tuition Scholarship” from the School of Graduate Studies. Congratulations!

  • [May 6, 2019]

    Congratulations to Di for defending his thesis!

Recent Publications [more]

Recent Project Pages [more]

3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels

Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones.

Adaptive Density Map Generation for Crowd Counting

In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for training a counter. The counter and generator are trained jointly within an end-to-end framework.

Eye Movement analysis with Switching HMMs (EMSHMM)

We use a switching hidden Markov model (EMSHMM) approach to analyze eye movement data in cognitive tasks involving cognitive state changes. A high-level state captures a participant’s cognitive state transitions during the task, and eye movement patterns during each high-level state are summarized with a regular HMM.

Parametric Manifold Learning of Gaussian Mixture Models

We propose a ParametRIc MAnifold Learning (PRIMAL) algorithm for Gaussian Mixtures Models (GMM), assuming that GMMs lie on or near to a manifold that is generated from a low-dimensional hierarchical latent space through parametric mappings. Inspired by Principal Component Analysis (PCA), the generative processes for priors, means and covariance matrices are modeled by
their respective latent space and parametric mapping.

On Diversity in Image Captioning: Metrics and Methods

In this project, we focus on the diversity of image captions. First, diversity metrics are proposed which is more correlated to human judgment. Second, we re-evaluate the existing models and find that (1) there is a large gap between human and the existing models in the diversity-accuracy space, (2) using reinforcement learning (CIDEr reward) to train captioning models leads to improving accuracy but reduce diversity. Third, we propose a simple but efficient approach to balance diversity and accuracy via reinforcement learning—using the linear combination of cross-entropy and CIDEr reward.

Recent Datasets and Code [more]

Eye Movement analysis with Switching HMMs (EMSHMM) Toolbox

This is a MATLAB toolbox for analyzing eye movement data using switching hidden Markov models (SHMMs), for analyzing eye movement data in cognitive tasks involving cognitive state changes. It includes code for learning SHMMs for individuals, as well as analyzing the results.

EgoDaily – Egocentric dataset for Hand Disambiguation

Egocentric hand detection dataset with variability on people, activities and places, to simulate daily life situations.

CityStreet: Multi-view crowd counting dataset

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.

CityUHK-X: crowd dataset with extrinsic camera parameters

Crowd counting dataset of indoor/outdoor scenes with extrinsic camera parameters (camera angle and height), for use as side information.

DPHEM toolbox for simplifying GMMs

Toolboxes for density-preserving HEM algorithm for simplifying mixture models.