Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [Aug 23, 2019]

    Xueying Zhan receives the “Outstanding Academic Performance Award”, and Xueying Zhan and Jia Wan receive the “Research Tuition Scholarship” from the School of Graduate Studies. Congratulations!

  • [May 6, 2019]

    Congratulations to Di for defending his thesis!

  • [Nov 1, 2018]

    Congratulations to Lei for defending her thesis!

  • [Jun 28, 2016]

    Congratulations to Sijin for defending his thesis!

Recent Publications [more]

  • Eye movement analysis with switching hidden Markov models.
    Tim Chuk, Antoni B. Chan, Shinsuke Shimojo, and Janet H. Hsiao,
    Behavior Research Methods, to appear 2019.
  • Visual Tracking via Dynamic Memory Networks.
    Tianyu Yang and Antoni B. Chan,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2019. [code]
  • Is that my hand? An egocentric dataset for hand disambiguation.
    Sergio R. Cruz and Antoni B. Chan,
    Image and Vision Computing, 89:131-143, Sept 2019. [dataset]
  • Adaptive Density Map Generation for Crowd Counting.
    Jia Wan and Antoni B. Chan,
    In: Intl. Conf. on Computer Vision (ICCV), Seoul, Oct 2019.
  • Hand Detection using Zoomed Neural Networks.
    Sergio R. Cruz and Antoni B. Chan,
    In: Intl. Conf. on Image Analysis and Processing (ICIAP), Trento, to appear Sep 2019.
  • Parametric Manifold Learning of Gaussian Mixture Models.
    Ziquan Liu, Lei Yu, Janet H. Hsiao, and Antoni B. Chan,
    In: International Joint Conference on Artificial Intelligence (IJCAI), Macau, Aug 2019.
  • Understanding Individual Differences in Eye Movement Pattern During Scene Perception through Co-Clustering of Hidden Markov Models.
    Janet H. Hsiao, Kin Yan Chan, Yue Feng Du, and Antoni B. Chan,
    In: The Annual Meeting of the Cognitive Science Society (CogSci), Montreal, Jul 2019.
  • ButtonTips: Designing Web Buttons with Suggestions.
    Dawei Liu, Ying Cao, Rynson W.H. Lau, and Antoni B. Chan,
    In: IEEE International Conference on Multimedia and Expo (ICME), Shanghai, to appear Jul 2019.
  • Describing like Humans: on Diversity in Image Captioning.
    Qingzhong Wang and Antoni B. Chan,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, June 2019. [code]
  • Residual Regression with Semantic Prior for Crowd Counting.
    Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, and Wei Liu,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, June 2019. [code]

Recent Project Pages [more]

Parametric Manifold Learning of Gaussian Mixture Models

We propose a ParametRIc MAnifold Learning (PRIMAL) algorithm for Gaussian Mixtures Models (GMM), assuming that GMMs lie on or near to a manifold that is generated from a low-dimensional hierarchical latent space through parametric mappings. Inspired by Principal Component Analysis (PCA), the generative processes for priors, means and covariance matrices are modeled by
their respective latent space and parametric mapping.

On Diversity in Image Captioning: Metrics and Methods

In this project, we focus on the diversity of image captions. First, diversity metrics are proposed which is more correlated to human judgment. Second, we re-evaluate the existing models and find that (1) there is a large gap between human and the existing models in the diversity-accuracy space, (2) using reinforcement learning (CIDEr reward) to train captioning models leads to improving accuracy but reduce diversity. Third, we propose a simple but efficient approach to balance diversity and accuracy via reinforcement learning—using the linear combination of cross-entropy and CIDEr reward.

Residual Regression with Semantic Prior for Crowd Counting

In this paper, a residual regression framework is proposed for crowd counting harnessing the correlation information among samples. By incorporating such information into our network, we discover that more intrinsic characteristics can be learned by the network which thus generalizes better to unseen scenarios. Besides, we show how to effectively leverage the semantic prior to improve the performance of crowd counting.

Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs

In this paper, we propose a deep neural network framework for multi-view crowd counting, which fuses information from multiple camera views to predict a scene-level density map on the ground-plane of the 3D world.

Simplification of Gaussian Mixture Models

An algorithm is proposed to simplify the Gaussian Mixture Models into a reduced mixture model with fewer mixture components, by maximizing a variational lower bound of the expected log-likelihood of a set of virtual samples.

Recent Datasets and Code [more]

EgoDaily – Egocentric dataset for Hand Disambiguation

Egocentric hand detection dataset with variability on people, activities and places, to simulate daily life situations.

  • Files: download page
  • If you use this dataset please cite:
    Is that my hand? An egocentric dataset for hand disambiguation.
    Sergio R. Cruz and Antoni B. Chan,
    Image and Vision Computing, 89:131-143, Sept 2019.
CityStreet: Multi-view crowd counting dataset

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.

DPHEM toolbox for simplifying GMMs

Toolboxes for density-preserving HEM algorithm for simplifying mixture models.

MADS: Martial Arts, Dancing, and Sports Dataset

A multi-view and stereo-depth dataset for 3D human pose estimation, which consists of challenging martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton).


Eye Movement Hidden Markov Models (EMHMM) Toolbox

This is a MATLAB toolbox for analyzing eye movement data using hidden Markov models. It includes code for learning HMMs for individuals, as well as clustering indivduals’ HMMs into groups.