About
Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Prof. Antoni Chan in the Department of Computer Science.
Our main research activities include:
- Computer Vision, Surveillance
- Machine Learning, Pattern Recognition
- Computer Audition, Music Information Retrieval
- Eye Gaze Analysis
For more information about our current research, please visit the projects and publication pages.
Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.
Latest News [more]
- [Jan 19, 2023]
Congratulations to Xueying for defending her thesis!
- [Dec 9, 2022]
Congratulations to Ziquan for defending his thesis!
- [Nov 30, 2022]
Call for Papers: Special Issue on “Applications of artificial intelligence, computer vision, physics and econometrics modelling methods in pedestrian traffic modelling and crowd safety” in Transportation Research Part C: Emerging Technologies. Deadline April 30th, 2023.
- [Mar 30, 2022]
Our project “Automatic Wide-area Crowd Surveillance Using Multiple Cameras” received the Silver Medal at Inventions Geneva Evaluation Days (IGED) 2022!
Recent Publications [more]
- TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization.
,
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), to appear 2023. - Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting.
,
In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), to appear 2023 (highlight). - DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.
,
In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), to appear 2023. - ODAM: Gradient-based Instance-Specific Visual Explanations for Object Detection.
,
In: Intl. Conf. on Learning Representations (ICLR), Rwanda, to appear May 2023. - Bayes-MIL: A New Probabilistic Perspective on Attention-based Multiple Instance Learning for Whole Slide Images.
,
In: Intl. Conf. on Learning Representations (ICLR), Rwanda, to appear May 2023. - Clustering Hidden Markov Models With Variational Bayesian Hierarchical EM.
,
IEEE Trans. on Neural Networks and Learning Systems (TNNLS), 34(3):1537-1551, March 2023 (online 2021). - A Lightweight and Detector-Free 3D Single Object Tracker on Point Clouds.
,
IEEE Trans. on Intelligent Transportation Systems, to appear 2023. - Variational Nested Dropout.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2023. - Optimal planning of municipal-scale distributed rooftop photovoltaic systems with maximized solar energy generation under constraints in high-density cities.
,
Energy, 263(Part A):125686, Jan 2023. - Improved Fine-Tuning by Better Leveraging Pre-Training Data.
,
In: Neural Information Processing Systems (NeurIPS), Nov 2022.
Recent Project Pages [more]
We propose a calibration-free multi-view crowd counting (CF-MVCC) method, which obtains the scene-level count as a weighted summation over the predicted density maps from the camera-views, without needing camera calibration parameters.
- "Calibration-free Multi-view Crowd Counting." In: European Conference on Computer Vision (ECCV), Tel Aviv, Oct 2022. [supplemental],
We propose a synchronization model that operates in conjunction with existing DNN-based multi-view models to allow them to work on unsynchronized data.
- "Single-Frame-Based Deep View Synchronization for Unsynchronized Multicamera Surveillance." IEEE Trans. on Neural Networks and Learning Systems (TNNLS), to appear 2022.,
We model eye movements on faces through integrating deep neural networks and hidden Markov Models (DNN+HMM).
- "Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models." npj Science of Learning, 7:28, Oct 2022.,
We derive loss functions in the frequency domain for training density map regression for crowd counting.
- "Crowd Counting in the Frequency Domain." In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.,
We propose a novel Crowd Counting framework built upon an external Momentum Template, termed C2MoT, which enables the encoding of domain specific information via an external template representation.
- "Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting." In: ACM Multimedia (MM), Oct 2021.,
Recent Datasets and Code [more]
Modeling Eye Movements with Deep Neural Networks and Hidden Markov Models (DNN+HMM)
This is the toolbox for modeling eye movements and feature learning with deep neural networks and hidden Markov models (DNN+HMM).
- Files: download here
- Project page
- If you use this toolbox please cite:
Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models.
,
npj Science of Learning, 7:28, Oct 2022.
Dolphin-14k: Chinese White Dolphin detection dataset
A dataset consisting of Chinese White Dolphin (CWD) and distractors for detection tasks.
- Files: Google Drive, Readme
- Project page
- If you use this dataset please cite:
Chinese White Dolphin Detection in the Wild.
,
In: ACM Multimedia Asia (MMAsia), Gold Coast, Australia, Dec 2021.
Crowd counting: Zero-shot cross-domain counting
Generalized loss function for crowd counting.
- Files: github
- Project page
- If you use this toolbox please cite:
Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting.
,
In: ACM Multimedia (MM), Oct 2021.
CVCS: Cross-View Cross-Scene Multi-View Crowd Counting Dataset
Synthetic dataset for cross-view cross-scene multi-view counting. The dataset contains 31 scenes, each with about ~100 camera views. For each scene, we capture 100 multi-view images of crowds.
- Files: Google Drive
- Project page
- If you use this dataset please cite:
Cross-View Cross-Scene Multi-View Crowd Counting.
,
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):557-567, Jun 2021.
Crowd counting: Generalized loss function
Generalized loss function for crowd counting.
- Files: github
- Project page
- If you use this toolbox please cite:
A Generalized Loss Function for Crowd Counting and Localization.
,
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.