About

Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Prof. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [Apr 9, 2024]

    Congratulations to Qiangqiang for defending his thesis!

  • [Jun 16, 2023]

    Congratulations to Hui for defending her thesis!

  • [Jan 19, 2023]

    Congratulations to Xueying for defending her thesis!

  • [Dec 9, 2022]

    Congratulations to Ziquan for defending his thesis!

Recent Publications [more]

  • Human attention guided explainable artificial intelligence for computer vision models.
    Guoyang Liu, Jindi Zhang, Antoni B. Chan, and Janet H. Hsiao,
    Neural Networks, 177:106392, Sep 2024.
  • Gradient-based Visual Explanation for Transformer-based CLIP.
    Chenyang Zhao, Kun Wang, Xingyu Zeng, Rui Zhao, and Antoni B. Chan,
    In: International Conference on Machine Learning (ICML), Vienna, Jul 2024.
  • The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks.
    Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, and Antoni B. Chan,
    In: International Conference on Machine Learning (ICML), Vienna, Jul 2024.
  • Is Holistic Processing Associated with Face Scanning Pattern and Performance in Face Recognition? Evidence from Deep Neural Network with Hidden Markov Modeling.
    Wei Xing, Yueyuan Zheng, Antoni B. Chan, and Janet H. Hsiao,
    In: Annual Conference of the Cognitive Science Society (CogSci), Rotterdam, Jul 2024.
  • Eye Movement Behavior during Mind Wandering across Different Tasks in Interactive Online Learning.
    Xiaoru Teng, Hui Lan, Gloria HY Wong, Antoni B. Chan, and Janet H. Hsiao,
    In: Annual Conference of the Cognitive Science Society (CogSci), Rotterdam, Jul 2024.
  • Do large language models resolve semantic ambiguities in the same way as humans? The case of word segmentation in Chinese sentence reading.
    Weiyan Liao, Zixuan Wang, Kathy Shum, Antoni B. Chan, and Janet H. Hsiao,
    In: Annual Conference of the Cognitive Science Society (CogSci), Rotterdam, Jul 2024.
  • Demystify Deep-learning AI for Object Detection using Human Attention Data.
    Jinhan Zhang, Guoyang Liu, Yunke Chen, Antoni B. Chan, and Janet H. Hsiao,
    In: Annual Conference of the Cognitive Science Society (CogSci), Rotterdam, Jul 2024.
  • Affecting Audience Valence and Arousal in 360 Immersive Environments: How Powerful Neural Style Transfer Is?
    Yanheng Li, Long Bai, Yaxuan Mao, Xuening Peng, Zehao Zhang, Jixing Li, Antoni B. Chan, Xin Tong, and RAY LC,
    In: HCI International 2024 Conference (HCII2024) - Virtual, Augmented, and Mixed Reality, Washington DC, Jun 2024.
  • Understanding and Fighting Scams: Media, Language, Appeals and Effects.
    Shuhua Zhou, Xiao Fan Liu, Fiona Fui-Hoon Nah, Simon Harrison, Xinzhi Zhang, Shanshan Zhen, Dannii Yeung, Janet H. Hsiao, Ray LC, Antoni B. Chan, Xiaohui Wang, Crystal Jiang, Fen Lin, Jixing Li, Andus Wong, Leanne Chan, Bert George, and Ping Li,
    In: HCI International 2024 Conference (HCII2024) - Late Breaking Papers, Washington DC, Jun 2024.
  • Learning Tracking Representations from Single Point Annotations.
    Qiangqiang Wu and Antoni B. Chan,
    In: CVPR Workshop on Learning With Limited Labelled Data for Image and Video Understanding (L3D-IVU), Jun 2024.

Recent Project Pages [more]

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

We propose a batch-mode Pareto Optimization Active Learning (POAL) framework for Active Learning under Out-of-Distribution data scenarios.

ODAM: Gradient-based Instance-specific Visual Explanation for Object Detection

We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors, including class score and bounding box coordinates.

A Comparative Survey of Deep Active Learning

We present a comprehensive comparative survey of 19 Deep Active Learning approaches for classification tasks.

A Comparative Survey: Benchmarking for Pool-based Active Learning

We introduce an active learning benchmark comprising 35 public datasets and experiment protocols, and evaluate 17 pool-based AL methods.

Calibration-free Multi-view Crowd Counting

We propose a calibration-free multi-view crowd counting (CF-MVCC) method, which obtains the scene-level count as a weighted summation over the predicted density maps from the camera-views, without needing camera calibration parameters.

Recent Datasets and Code [more]

Modeling Eye Movements with Deep Neural Networks and Hidden Markov Models (DNN+HMM)

This is the toolbox for modeling eye movements and feature learning with deep neural networks and hidden Markov models (DNN+HMM).

Dolphin-14k: Chinese White Dolphin detection dataset

A dataset consisting of  Chinese White Dolphin (CWD) and distractors for detection tasks.

Crowd counting: Zero-shot cross-domain counting

Generalized loss function for crowd counting.

CVCS: Cross-View Cross-Scene Multi-View Crowd Counting Dataset

Synthetic dataset for cross-view cross-scene multi-view counting. The dataset contains 31 scenes, each with about ~100 camera views. For each scene, we capture 100 multi-view images of crowds.

Crowd counting: Generalized loss function

Generalized loss function for crowd counting.