Recent multiple-object-tracking (MOT) approaches have been focused on the tracking-by-detection paradigm, whose goal is to first detect the targets in each frame and then associate them into full trajectories. Such approaches have been successful in scenarios with low-density of targets. In crowded scenes, however, they often fail to extract the correct trajectories due to the detection failures caused by occlusions and the high densities of targets, even with state-of-the-art detectors trained on large-scale datasets.
We propose in this project novel MOT approach, explicitly designed for handling crowded scenes. We incorporate object counting, a reliable and informative clue in crowded scenarios, into our modeling, and solve the multiple-object detection, counting, and tracking simultaneously over the whole video sequence.
Specifically, for each frame, we estimate an object density map, based on which a 3D sliding window is applied for estimating object counts. We then construct a spatio-temporal graph over the whole video sequence, where each node denotes a candidate detection at a pixel location, each edge denotes a possible association, and a sum over a set of nodes denotes a count in the corresponding sliding window. Using the constructed graph, we model the joint detection-tracking-counting problem as a network flow program. The network-flow constraints and the object-count constraint reinforce each other and together benefit the tracking. The global optimal solutions of the network flow program, or the optimal object trajectories, are obtained used off-the-shelf solvers.
IEEE Trans. on Image Processing (TIP), 30:1439-1452, 2021.
- Coming soon.