About

Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [May 6, 2019]

    Congratulations to Di for defending his thesis!

  • [Nov 1, 2018]

    Congratulations to Lei for defending her thesis!

  • [Jun 28, 2016]

    Congratulations to Sijin for defending his thesis!

  • [Jun 25, 2016]

    Congratulations to Adeel for winning a “Best Research Paper Award 2013/14” from the Higher Education Commission (HEC) of Pakistan for his TPAMI 2013 paper!

Recent Publications [more]

  • Visual Tracking via Dynamic Memory Networks.
    Tianyu Yang and Antoni B. Chan,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2019.
  • Is that my hand? An egocentric dataset for hand disambiguation.
    Sergio R. Cruz and Antonii B. Chan,
    Image and Vision Computing, to appear 2019.
  • Hand Detection using Zoomed Neural Networks.
    Sergio R. Cruz and Antoni B. Chan,
    In: Intl. Conf. on Image Analysis and Processing (ICIAP), Trento, to appear Sep 2019.
  • Parametric Manifold Learning of Gaussian Mixture Models.
    Ziquan Liu, Lei Yu, Janet H. Hsiao, and Antoni B. Chan,
    In: International Joint Conference on Artificial Intelligence (IJCAI), Macau, to appear Aug 2019.
  • Understanding Individual Differences in Eye Movement Pattern During Scene Perception through Co-Clustering of Hidden Markov Models.
    Janet H. Hsiao, Kin Yan Chan, Yue Feng Du, and Antoni B. Chan,
    In: The Annual Meeting of the Cognitive Science Society (CogSci), Montreal, Jul 2019.
  • ButtonTips: Designing Web Buttons with Suggestions.
    Dawei Liu, Ying Cao, Rynson W.H. Lau, and Antoni B. Chan,
    In: IEEE International Conference on Multimedia and Expo (ICME), Shanghai, to appear Jul 2019.
  • Describing like Humans: on Diversity in Image Captioning.
    Qingzhong Wang and Antoni B. Chan,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, June 2019. [code]
  • Residual Regression with Semantic Prior for Crowd Counting.
    Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, and Wei Liu,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, June 2019.
  • Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs.
    Qi Zhang and Antoni B. Chan,
    In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, June 2019.
  • Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference.
    Lei Yu, Tianyu Yang, and Antoni B. Chan,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 41(6):1323-1337, June 2019.

Recent Project Pages [more]

On Diversity in Image Captioning: Metrics and Methods

In this project, we focus on the diversity of image captions. First, diversity metrics are proposed which is more correlated to human judgment. Second, we re-evaluate the existing models and find that (1) there is a large gap between human and the existing models in the diversity-accuracy space, (2) using reinforcement learning (CIDEr reward) to train captioning models leads to improving accuracy but reduce diversity. Third, we propose a simple but efficient approach to balance diversity and accuracy via reinforcement learning—using the linear combination of cross-entropy and CIDEr reward.

Residual Regression with Semantic Prior for Crowd Counting

In this paper, a residual regression framework is proposed for crowd counting harnessing the correlation information among samples. By incorporating such information into our network, we discover that more intrinsic characteristics can be learned by the network which thus generalizes better to unseen scenarios. Besides, we show how to effectively leverage the semantic prior to improve the performance of crowd counting.

Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs

In this paper, we propose a deep neural network framework for multi-view crowd counting, which fuses information from multiple camera views to predict a scene-level density map on the ground-plane of the 3D world.

Recent Datasets and Code [more]

CityStreet: Multi-view crowd counting dataset

Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.

DPHEM toolbox for simplifying GMMs

Toolboxes for density-preserving HEM algorithm for simplifying mixture models.

MADS: Martial Arts, Dancing, and Sports Dataset

A multi-view and stereo-depth dataset for 3D human pose estimation, which consists of challenging martial arts actions (Tai-chi and Karate), dancing actions (hip-hop and jazz), and sports actions (basketball, volleyball, football, rugby, tennis and badminton).

mads-featured