Reproducibility Study on Adversarial Attacks against Robust Transformer Trackers

Fatemeh N. Nokabadi   Jean-François Lalonde   Christian Gagné  







[OpenReview]
[Supplementary Material]
[Code]
[ArXiv]
[HAL]

Accepted in Transactions on Machine Learning Research (TMLR), 2024!

Winner of ML Reproducibility Challenge (MLRC) 2023!



Abstract

New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on un- derstanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and ro- bustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at GitHub.

[Bibtex]

Supplementary Material

Table of Contents

  1. Bounding box vs. Binary mask (experiment 1, section 4.1)
  2. Perturbation level shifts: White-box attacks (experiment 2, section 4.2)
  3. Perturbation level shifts: Black-box attack (experiment 3, section 4.3)

1. Bounding box vs. Binary mask

In the first experiment, we applied the adversarial attacks against TransT-SEG and MixFormerM, and as a result, we created a video of the output of the tracker before (Green Mask/BBOX) and after the attack (Red Mask/BBOX) .

The white-box attacks are more effective against TransT-SEG tracker whether the evaluation is based on the bounding box or the binary mask .

Black-box attacks against TransT-SEG

White-box attacks against TransT-SEG

Black-box attacks against MixFormerM

2. Perturbation level shifts: White-box attacks

In this section, we applied the adversarial attacks against TransT, and as a result, we created a series of videos using the perturbed search regions and perturbation maps in different perturbation levels for the white-box approaches: SPARK and RTAA. The search regions after the attack may show different areas of the same frame, depending on the effect of each attack and bounding box degradation.

Any perturbed region with SSIM lower than 50% is considered as a super-perturbed region. In lower perturbation levels, the perceptibility of the generated perturbations is greater while in higher levels, the number of super-perturbed frames are inscreased.

Perturbed search regions and Perturbation maps: ε = 2.55

Perturbed search regions and Perturbation maps: ε = 5.1

Perturbed search regions and Perturbation maps: ε = 10.2

Perturbed search regions and Perturbation maps: ε = 20.4

Perturbed search regions and Perturbation maps: ε = 40.8

3. Perturbation level shifts: Black-box attack

We have created video sequences by using the original tracking sequences as a base. These videos are generated by attacking the ROMTrack tracker with IoU method in different levels of the perturbation.

.

Perturbed Frame: ζ = 8k

Perturbed Frame: ζ = 10k

Perturbed Frame: ζ = 12k

Perturbation Map: ζ = 8k

Perturbation Map: ζ = 10k

Perturbation Map: ζ = 12k

Acknowledgements

This work is supported by the DEEL Project CRDPJ 537462-18 funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Consortium for Research and Innovation in Aerospace in Québec (CRIAQ), together with its industrial partners Thales Canada inc, Bell Textron Canada Limited, CAE inc and Bombardier inc. MÉIE-Québec(DEEL project)