Pedestrian Crowd Counting

pedcount_fig_smallThere is currently a great interest in vision technology for monitoring all types of environments. This could have many goals, e.g., security, resource management, urban planning, or advertising. From a technological standpoint, computer vision solutions typically focus on detecting, tracking, and analyzing individuals (e.g., finding and tracking a person walking in a parking lot, or identifying the interaction between two people). While there has been some success with this type of “individual-centric” surveillance, it is not scalable to scenes with large crowds, where each person is depicted by a few image pixels, people occlude each other in complex ways, and the number of targets to track is overwhelming.

Nonetheless, there are many problems in monitoring that can be solved without explicit tracking of individuals. These are problems where all the information required to perform the task can be gathered by analyzing the environment holistically or globally, e.g. monitoring of traffic flows, detection of disturbances in public spaces, detection of highway speeding, or estimation of crowd sizes. By definition, these tasks are based on either properties of 1) the crowd as a whole, or 2) an individual’s deviation from the crowd. In both cases, to accomplish the task it should suffice to build good models for the patterns of crowd behavior. Events could then be detected as variations in these patterns, and abnormal individual actions could be detected as outliers with respect to the crowd behavior.

An example surveillance task that can be solved by a “crowd-centric” approach is that of pedestrian counting. Yet, it is frequently addressed with “individual-centric” methods: detect the people in the scene, track them over time, and count the number of tracks. The problem is that, as the crowd becomes larger and denser, both individual detection and tracking become close to impossible. Unlike these proposals, we show that there is no need for pedestrian detection, object tracking, or object-based image primitives to accomplish the pedestrian counting goal, even when the crowd is sizable and inhomogeneous, e.g., has sub-components with different dynamics, and appears in unconstrained outdoor environments. In fact, we argue that, when a “crowd-centric” approach is considered, the problem actually appears to become simpler. We simply segment the crowd into sub-parts of interest (e.g., groups of people moving in different directions), extract a set of holistic features from each segment, and estimate the crowd size with a suitable regression function. By bypassing intermediate processing stages, such as people detection or tracking, which are susceptible to occlusion problems, the proposed approach produces robust and accurate crowd counts, even when the crowd is large and dense.

Finally, because only low-level features are used, the resulting algorithms can be implemented with special-purpose cameras that do not produce a visual record of the scene, e.g. by outputing only low-level features like segmentations, edges, and texture information. This can be exploited to build “privacy preserving” crowd counting systems. Privacy preserving solutions have great societal appeal, minimizing the hurdles to the practical deployment of vision technology.

Selected Publications

Demos/Results

Datasets/Code