Welcome to the Video, Image, and Sound Analysis Lab (VISAL) at the City University of Hong Kong! The lab is directed by Dr. Antoni Chan in the Department of Computer Science.

Our main research activities include:

  • Computer Vision, Surveillance
  • Machine Learning, Pattern Recognition
  • Computer Audition, Music Information Retrieval
  • Eye Gaze Analysis

For more information about our current research, please visit the projects and publication pages.

Opportunities for graduate students and research assistants – if you are interested in joining the lab, please check this information.

Latest News [more]

  • [May 6, 2019]

    Congratulations to Di for defending his thesis!

  • [Nov 1, 2018]

    Congratulations to Lei for defending her thesis!

  • [Jun 28, 2016]

    Congratulations to Sijin for defending his thesis!

  • [Jun 25, 2016]

    Congratulations to Adeel for winning a “Best Research Paper Award 2013/14” from the Higher Education Commission (HEC) of Pakistan for his TPAMI 2013 paper!

Recent Publications [more]

Recent Project Pages [more]

Simplification of Gaussian Mixture Models

An algorithm is proposed to simplify the Gaussian Mixture Models into a reduced mixture model with fewer mixture components, by maximizing a variational lower bound of the expected log-likelihood of a set of virtual samples.

Convolutional Decoders for Image Captioning

RNN-based models dominate the field of image captioning, however, (1) RNNs have to be calculated step-by-step, which is not easily parallelized. (2) There is a long path between the start and end of the sentence using RNNs. Tree structures can make a shorter path, but trees require special processing. (3) RNNs only learn single-level representations at each time step, while convolutional decoders are able to learn multi-level representations of concepts, and each of them should corresponds to an image area, which should benefit word prediction.

Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks – Counting, Detection, and Tracking

We propose CNN-pixel and FCNN-skip to produce an original-resolution density map. In our experiments, we found that the lower-resolution density maps sometimes have better counting performance. In contrast, the original-resolution density maps improved localization tasks, such as detection and tracking, compared to bilinear upsampling the lower-resolution density maps.