Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Prof. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [Jun 16, 2023]

    Congratulations to Hui for defending her thesis!

  • [Jan 19, 2023]

    Congratulations to Xueying for defending her thesis!

  • [Dec 9, 2022]

    Congratulations to Ziquan for defending his thesis!

  • [Nov 30, 2022]

    Call for Papers: Special Issue on “Applications of artificial intelligence, computer vision, physics and econometrics modelling methods in pedestrian traffic modelling and crowd safety” in Transportation Research Part C: Emerging Technologies. Deadline April 30th, 2023.

Recent Publications [more]

  • Retrieval-Augmented Multiple Instance Learning.
    Yufei Cui, Ziquan Liu, Yixin Chen, Yuchen Lu, Xinyue Yu, Xue Liu, Tei-Wei Kuo, Miguel RD Rodrigues, Chun Jason Xue, and Antoni B. Chan,
    In: Neural Information Processing Systems (NeurIPS), New Orleans, To appear Dec 2023.
  • Towards the next generation explainable AI that promotes AI-human mutual understanding.
    Janet H. Hsiao and Antoni B. Chan,
    In: NeurIPS workshop on XAI in Action: Past, Present, and Future Applications, New Orleans, Dec 2023.
  • Generalized Characteristic Function Loss for Crowd Analysis in the Frequency Domain.
    Weibo Shu, Jia Wan, and Antoni B. Chan,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2023.
  • Scalable Video Object Segmentation with Simplified Framework.
    Qiangqiang Wu, Tianyu Yang, Wei Wu, and Antoni B. Chan,
    In: International Conf. Computer Vision (ICCV), Paris, to appear 2023.
  • Modeling Noisy Annotations for Point-Wise Supervision.
    Jia Wan, Qiangqiang Wu, and Antoni B. Chan,
    IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), to appear 2023.
  • Variational Nested Dropout.
    Yufei Cui, Yu Mao, Ziquan Liu, Qiao Li, Antoni B. Chan, Xue Liu, Tei-Wei Kuo, and Xue Chun,
    IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 45(8):10519-10534, Aug 2023 (online Feb 2023).
  • Human Attention-Guided Explainable AI for Object Detection.
    Guoyang Liu, Jindi Zhang, Antoni B. Chan, and Janet H. Hsiao,
    In: Annual Conference of the Cognitive Science Society, July 2023.
  • TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization.
    Ziquan Liu, Yi Xu, Xiangyang Ji, and Antoni B. Chan,
    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023.
  • Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting.
    Wei Lin and Antoni B. Chan,
    In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Jun 2023 (highlight).
  • DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.
    Qiangqiang Wu, Tianyu Yang, Ziquan Liu, Baoyuan Wu, Ying Shan, and Antoni B. Chan,
    In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Jun 2023. [code]

Recent Project Pages [more]

Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

We propose a batch-mode Pareto Optimization Active Learning (POAL) framework for Active Learning under Out-of-Distribution data scenarios.

ODAM: Gradient-based Instance-specific Visual Explanation for Object Detection

We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors, including class score and bounding box coordinates.

A Comparative Survey of Deep Active Learning

We present a comprehensive comparative survey of 19 Deep Active Learning approaches for classification tasks.

A Comparative Survey: Benchmarking for Pool-based Active Learning

We introduce an active learning benchmark comprising 35 public datasets and experiment protocols, and evaluate 17 pool-based AL methods.

Calibration-free Multi-view Crowd Counting

We propose a calibration-free multi-view crowd counting (CF-MVCC) method, which obtains the scene-level count as a weighted summation over the predicted density maps from the camera-views, without needing camera calibration parameters.

Recent Datasets and Code [more]

Modeling Eye Movements with Deep Neural Networks and Hidden Markov Models (DNN+HMM)

This is the toolbox for modeling eye movements and feature learning with deep neural networks and hidden Markov models (DNN+HMM).

Dolphin-14k: Chinese White Dolphin detection dataset

A dataset consisting of  Chinese White Dolphin (CWD) and distractors for detection tasks.

Crowd counting: Zero-shot cross-domain counting

Generalized loss function for crowd counting.

CVCS: Cross-View Cross-Scene Multi-View Crowd Counting Dataset

Synthetic dataset for cross-view cross-scene multi-view counting. The dataset contains 31 scenes, each with about ~100 camera views. For each scene, we capture 100 multi-view images of crowds.

Crowd counting: Generalized loss function

Generalized loss function for crowd counting.