About

Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [Nov 28, 2019]

    Congratulations to Weihong for defending his thesis!

  • [Nov 28, 2019]

    Congratulations to Tianyu for defending his thesis!

  • [Aug 23, 2019]

    Xueying Zhan receives the “Outstanding Academic Performance Award”, and Xueying Zhan and Jia Wan receive the “Research Tuition Scholarship” from the School of Graduate Studies. Congratulations!

  • [May 6, 2019]

    Congratulations to Di for defending his thesis!

Recent Publications [more]

  • 3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels.
    Qi Zhang and Antoni Bert Chan,
    In: AAAI Conference on Artificial Intelligence, New York, to appear 2020.
  • Eye movement analysis with switching hidden Markov models.
    Tim Chuk, Antoni B. Chan, Shinsuke Shimojo, and Janet H. Hsiao,
    Behavior Research Methods, to appear 2019.
  • Visual Tracking via Dynamic Memory Networks.
    Tianyu Yang and Antoni B. Chan,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2019. [code]
  • Is that my hand? An egocentric dataset for hand disambiguation.
    Sergio R. Cruz and Antoni B. Chan,
    Image and Vision Computing, 89:131-143, Sept 2019. [dataset]
  • Adaptive Density Map Generation for Crowd Counting.
    Jia Wan and Antoni B. Chan,
    In: Intl. Conf. on Computer Vision (ICCV), Seoul, Oct 2019.
  • Hand Detection using Zoomed Neural Networks.
    Sergio R. Cruz and Antoni B. Chan,
    In: Intl. Conf. on Image Analysis and Processing (ICIAP), Trento, to appear Sep 2019.
  • Parametric Manifold Learning of Gaussian Mixture Models.
    Ziquan Liu, Lei Yu, Janet H. Hsiao, and Antoni B. Chan,
    In: International Joint Conference on Artificial Intelligence (IJCAI), Macau, Aug 2019.
  • Understanding Individual Differences in Eye Movement Pattern During Scene Perception through Co-Clustering of Hidden Markov Models.
    Janet H. Hsiao, Kin Yan Chan, Yue Feng Du, and Antoni B. Chan,
    In: The Annual Meeting of the Cognitive Science Society (CogSci), Montreal, Jul 2019.
  • ButtonTips: Designing Web Buttons with Suggestions.
    Dawei Liu, Ying Cao, Rynson W.H. Lau, and Antoni B. Chan,
    In: IEEE International Conference on Multimedia and Expo (ICME), Shanghai, to appear Jul 2019.
  • Describing like Humans: on Diversity in Image Captioning.
    Qingzhong Wang and Antoni B. Chan,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, June 2019. [code]

Recent Project Pages [more]

Adaptive Density Map Generation for Crowd Counting

In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework.

  • Jia Wan and Antoni B. Chan, "Adaptive Density Map Generation for Crowd Counting." In: Intl. Conf. on Computer Vision (ICCV), Seoul, Oct 2019.
3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels

Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones.

  • Qi Zhang and Antoni Bert Chan, "3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels." In: AAAI Conference on Artificial Intelligence, New York, to appear 2020.
Eye Movement analysis with Switching HMMs (EMSHMM)

We use a switching hidden Markov model (EMSHMM) approach to analyze eye movement data in cognitive tasks involving cognitive state changes. A high-level state captures a participant’s cognitive state transitions during the task, and eye movement patterns during each high-level state are summarized with a regular HMM.

  • Tim Chuk, Antoni B. Chan, Shinsuke Shimojo, and Janet H. Hsiao, "Eye movement analysis with switching hidden Markov models." Behavior Research Methods, to appear 2019.
Parametric Manifold Learning of Gaussian Mixture Models

We propose a ParametRIc MAnifold Learning (PRIMAL) algorithm for Gaussian Mixtures Models (GMM), assuming that GMMs lie on or near to a manifold that is generated from a low-dimensional hierarchical latent space through parametric mappings. Inspired by Principal Component Analysis (PCA), the generative processes for priors, means and covariance matrices are modeled by
their respective latent space and parametric mapping.

On Diversity in Image Captioning: Metrics and Methods

In this project, we focus on the diversity of image captions. First, diversity metrics are proposed which is more correlated to human judgment. Second, we re-evaluate the existing models and find that (1) there is a large gap between human and the existing models in the diversity-accuracy space, (2) using reinforcement learning (CIDEr reward) to train captioning models leads to improving accuracy but reduce diversity. Third, we propose a simple but efficient approach to balance diversity and accuracy via reinforcement learning—using the linear combination of cross-entropy and CIDEr reward.

Recent Datasets and Code [more]

EgoDaily – Egocentric dataset for Hand Disambiguation

Egocentric hand detection dataset with variability on people, activities and places, to simulate daily life situations.

  • Files: download page
  • If you use this dataset please cite:
    Is that my hand? An egocentric dataset for hand disambiguation.
    Sergio R. Cruz and Antoni B. Chan,
    Image and Vision Computing, 89:131-143, Sept 2019.
CityStreet: Multi-view crowd counting dataset

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.

CityUHK-X: crowd dataset with extrinsic camera parameters

Crowd counting dataset of indoor/outdoor scenes with extrinsic camera parameters (camera angle and height), for use as side information.

DPHEM toolbox for simplifying GMMs

Toolboxes for density-preserving HEM algorithm for simplifying mixture models.

MADS: Martial Arts, Dancing, and Sports Dataset

A multi-view and stereo-depth dataset for 3D human pose estimation, which consists of challenging martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton).

mads-featured