Abstract

Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. We leverage the physical characteristics of such lenses, which are analytically defined by the radial distortion profile (assumed to be known), to develop a distortion aware radial swin transformer (DarSwin). In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and a polar position encoding for radial patch merging. We validate our method on classification tasks using synthetically distorted ImageNet data and show through extensive experiments that DarSwin can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. Compared to other baselines, DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy under medium (high) distortion levels, and comparable to the state-of-the-art under low and very low distortion levels (perspective-like images)

Presentation video

Citation

@article{athwale2023darswin,
    title={DarSwin : Distortion Aware Radial Swin Transformer},
    author={Athwale, Akshaya and Afrasiyabi, Arman and Lagüe, Justin and Shili, Ichrak and Ahmad, Ola and Lalonde, Jean-Fran{\c{c}}ois},
    journal={IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2023}
  }

Acknowledgements

This research was partially supported by NSERC grant ALLRP-567654, Thales, an NSERC USRA to J. Lagüe, and the Digital Research Alliance Canada. We thank Yohan Poirier-Ginter, Frederic Fortier-Chouinard, Adam Tupper and Justine Giroux of Universite Laval for proofreading