|
|
|
|
|
|
|
Accepted in Transactions on Machine Learning Research (TMLR), 2024! |
New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on un- derstanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and ro- bustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at GitHub. |
In the first experiment, we applied the adversarial attacks against TransT-SEG and MixFormerM, and as a result, we created a video of the output of the tracker before (Green Mask/BBOX) and after the attack (Red Mask/BBOX) .
The white-box attacks are more effective against TransT-SEG tracker whether the evaluation is based on the bounding box or the binary mask .
Black-box attacks against TransT-SEG
White-box attacks against TransT-SEG
Black-box attacks against MixFormerM
In this section, we applied the adversarial attacks against TransT, and as a result, we created a series of videos using the perturbed search regions and perturbation maps in different perturbation levels for the white-box approaches: SPARK and RTAA. The search regions after the attack may show different areas of the same frame, depending on the effect of each attack and bounding box degradation.
Any perturbed region with SSIM lower than 50% is considered as a super-perturbed region. In lower perturbation levels, the perceptibility of the generated perturbations is greater while in higher levels, the number of super-perturbed frames are inscreased.
Perturbed search regions and Perturbation maps: ε = 2.55
Perturbed search regions and Perturbation maps: ε = 5.1
Perturbed search regions and Perturbation maps: ε = 10.2
Perturbed search regions and Perturbation maps: ε = 20.4
Perturbed search regions and Perturbation maps: ε = 40.8
We have created video sequences by using the original tracking sequences as a base. These videos are generated by attacking the ROMTrack tracker with IoU method in different levels of the perturbation.
.
Perturbed Frame: ζ = 8k
Perturbed Frame: ζ = 10k
Perturbed Frame: ζ = 12k
Perturbation Map: ζ = 8k
Perturbation Map: ζ = 10k
Perturbation Map: ζ = 12k