About
Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.
Our main research activities include:
- Computer Vision, Surveillance
- Machine Learning, Pattern Recognition
- Computer Audition, Music Information Retrieval
- Eye Gaze Analysis
For more information about our current research, please visit the projects and publication pages.
Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.
Latest News [more]
- [Jan 27, 2021]
Congratulations to Qi for defending his thesis!
- [Sep 11, 2020]
Congratulations to Sergio for defending his thesis!
- [Nov 28, 2019]
Congratulations to Weihong for defending his thesis!
- [Nov 28, 2019]
Congratulations to Tianyu for defending his thesis!
Recent Publications [more]
- Do portrait artists have enhanced face processing abilities? Evidence from hidden Markov modeling of eye movements.
,
Cognition, to appear 2021. - Eye Movement analysis with Hidden Markov Models (EMHMM) with co-clustering.
,
Behavior Research Methods, to appear 2021. - Applying the Hidden Markov Model to Analyze Urban Mobility Patterns: An Interdisciplinary Approach.
,
Chinese Geographical Science, 31(1):1-13, Feb 2021. - PRIMAL-GMM: PaRametrIc MAnifold Learning of Gaussian Mixture Models.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2021. - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets.
,
IEEE Trans. on Image Processing (TIP), 30:1439-1452, 2021. - Visual Tracking via Dynamic Memory Networks.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 43(1):360-374, Jan 2021. [code] - Kernel-based Density Map Generation for Dense Object Counting.
,
IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), to appear 2020. - Modeling Noisy Annotations for Crowd Counting.
,
In: Neural Information Processing Systems (NeurIPS), Dec 2020. [supplemental] - On Diversity in Image Captioning: Metrics and Methods.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), to appear 2020.
Recent Project Pages [more]
In this paper, we propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals (e.g. standing/sitting or violent behavior) and then counts the number of people in each category. To enable research in this area, we construct a new dataset of four real-world fine-grained counting tasks: traveling direction on a sidewalk, standing or sitting, waiting in line or not, and exhibiting violent behavior or not.
- ,
We propose a new multiple-object tracking (MOT) paradigm, tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we jointly model detection, counting, and tracking of multiple targets as a network flow program, which simultaneously finds the global optimal detections and
trajectories of multiple targets over the whole video.
- "Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets." IEEE Trans. on Image Processing (TIP), 30:1439-1452, 2021.,
We model the annotation noise using a random variable with Gaussian distribution and derive the pdf of the crowd density value for each spatial location in the image. We then approximate the joint distribution of the density values (i.e., the distribution of density maps) with a full covariance multivariate Gaussian density, and derive a low-rank approximate for tractable implementation.
- "Modeling Noisy Annotations for Crowd Counting." In: Neural Information Processing Systems (NeurIPS), Dec 2020. [supplemental],
We propose a generic framework to approximate the output probability distribution induced by a Bayesian NN model posterior with a parameterized model and in an amortized fashion. The aim is to approximate the predictive uncertainty of a specific Bayesian model, meanwhile alleviating the heavy workload of MC integration at testing time.
- "Accelerating Monte Carlo Bayesian Prediction via Approximating Predictive Uncertainty over the Simplex." IEEE Transactions on Neural Networks and Learning Systems (TNNLS), to appear 2020.,
To improve the distinctiveness of image captions, we first propose a metric, between-set CIDEr (CIDErBtw), to evaluate the distinctiveness of a caption with respect to those of similar images, and then propose several new training strategies for image captioning based on the new distinctiveness measure.
- "Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets." In: European Conference on Computer Vision (ECCV), Aug 2020 (oral).,
Recent Datasets and Code [more]
Eye Movement analysis with Switching HMMs (EMSHMM) Toolbox
This is a MATLAB toolbox for analyzing eye movement data using switching hidden Markov models (SHMMs), for analyzing eye movement data in cognitive tasks involving cognitive state changes. It includes code for learning SHMMs for individuals, as well as analyzing the results.
- Files: download here
- Project page
- If you use this toolbox please cite:
Eye movement analysis with switching hidden Markov models.
,
Behavior Research Methods, 52:1026-1043, June 2020.
EgoDaily – Egocentric dataset for Hand Disambiguation
Egocentric hand detection dataset with variability on people, activities and places, to simulate daily life situations.
- Files: download page
- If you use this dataset please cite:
Is that my hand? An egocentric dataset for hand disambiguation.
,
Image and Vision Computing, 89:131-143, Sept 2019.
CityStreet: Multi-view crowd counting dataset
Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC.
- Files: download page
- Project page
- If you use this dataset please cite:
Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs.
,
In: IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, June 2019.
CityUHK-X: crowd dataset with extrinsic camera parameters
Crowd counting dataset of indoor/outdoor scenes with extrinsic camera parameters (camera angle and height), for use as side information.
- Files: zip (1.8GB) | readme
- Project page
- If you use this dataset please cite:
Incorporating Side Information by Adaptive Convolution.
,
In: Neural Information Processing Systems, Long Beach, Dec 2017.
DPHEM toolbox for simplifying GMMs
Toolboxes for density-preserving HEM algorithm for simplifying mixture models.
- Files: Python toolbox (compatible with sklearn.mixture.GaussianMixture) and Matlab toolbox
- If you use this code please cite:
Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference.
,
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 41(6):1323-1337, June 2019.