-
Comparison with Other Pre-training Approaches:
Table 1: Comparison of pre-training methods on downstream VOT and VOS tasks on GOT-10k and DAVIS-17. All methods adopt the ViT-B/16 model with 224×224 input images for pre-training. The pre-training time is measured on 64 NVIDIA V100 GPUs. The best two results are shown in red and blue.
-
Ablation study of the dropout ratio P on the GOT-10k (VOT) and DAVIS-17 (VOS) datasets:
-
Usage of Various Source Training Datasets: