ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

Supplementary Material

Table of Contents

  1. Training data augmentation, extending Sec. 3.1 in the paper.
  2. Additional quantitative results, extending Tab. 1 in the paper.
  3. Qualitative comparison with image-based compositing methods, extending Fig. 4 in the paper.
  4. Qualitative comparison with lighting estimation methods, extending Fig. 4 in the paper.
  5. Human perceptual study, extending Sec. 5.3 in the paper.
  6. Ablation study on different predictors, extending Sec. 5.4 in the paper.
  7. Video demo.
  8. Failure cases.
  9. Additional objects.
  10. More real-world 2D compositing.

1. Training data augmentation

As mentionned in Sec. 3.1 of the paper, we keep three intrinsic maps from OpenRooms unchanged (depth, normal and albedo), but randomly remove parts of the shading using one of the following masking strategies.

Note that these examples were seen by the network at training time.

Random masking (60% of the time)

Here, a random combination of rectangles and circles are used to create the shading mask.
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth

Fully masked shading (30% of the time)

Here, no shading information is fed to the network. Therefore, it has to hallucinate the full shading information and relight the whole scene.
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth

Fully known shading (10% of the time)

We also train the network to render an image where all the shading is given.
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth
Depth
Depth
Surface normals
Surface normals
Albedo
Albedo
Shading mask
Shading mask
Shading
Shading
Prediction
Prediction
Ground truth
Ground truth

2. Additional quantitative results

The metrics in Tab. 1 of the paper were computed on the whole image, which is a standard way of quantitatively evaluating compositing results. Since some image-based compositing methods like AnyDoor [12] and ControlCom [85] work in the latent space of Stable Diffusion and don't use a background preserving technique like ours (described in Sec. 3.3), we also evaluate metrics in a way that gives less penalty to the background fidelity. To do so, we compute the same metrics on a rectangular crop defined as the bounding box of the object mask, extended by 25% on each side (while ensuring to avoid going over the image's bounds). This crop therefore includes the full object and most shadows. Note that the LPIPS here is also calculated after resizing the test images and references to 256x256.

The following table shows the metrics computed on the crop for all methods. We observe that our method still beats image-based methods on all metrics. Furthermore, our method is still competitive or better than lighting-based methods. It is important to note that computing metrics on the object crop generally degrades scores for all methods, which is due to the fact that the error is computed on the parts of the image that change the most, i.e. the inserted object and its surroundings.
Method PSNR RMSE si-RMSE SSIM MAE LPIPS FLIP
Lighting-based methods
Gardner et al. 2017 [22] 18.4 0.1421 0.0959 0.844 0.0854 0.1326 0.2506
Garon et al. 2019 [23] 26.7 0.0509 0.0477 0.938 0.0307 0.0670 0.1471
Gardner et al. 2019 [21] 24.9 0.0604 0.0505 0.926 0.0354 0.0826 0.1664
Everlight [17] 25.8 0.0578 0.0540 0.924 0.0355 0.0692 0.1563
StyleLight [70] 22.1 0.0825 0.0685 0.896 0.0481 0.1010 0.1904
Weber et al. 2022 [72] 22.1 0.0823 0.0656 0.907 0.0493 0.0938 0.2035
EMLight [84] 25.3 0.0594 0.0564 0.918 0.0359 0.0786 0.1600
Image-based methods
AnyDoor [12] 17.1 0.1307 0.1236 0.639 0.0817 0.2739 0.3058
ControlCom [85] 18.7 0.1076 0.0913 0.704 0.0686 0.1992 0.2903
Careaga et al. 2023 [11] 19.1 0.1087 0.0832 0.852 0.0657 0.1380 0.2476
ARShadowGAN [46] 21.5 0.0866 0.0754 0.830 0.0526 0.1425 0.2164
Zero-shot intrinsics compositing
ZeroComp Openrooms (Ours) 24.4 0.0588 0.0544 0.870 0.0356 0.0961 0.1691
ZeroComp InteriorVerse 25.7 0.0504 0.0473 0.885 0.0300 0.0872 0.1504

3. Qualitative comparison with image-based compositing methods

In the following, we show additional results on the evaluation dataset for image-based compositing methods, extending Fig. 4 from the paper. Instead of sorting images by PSNR, as done in the paper, here they are sampled randomly from the test set.
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
anydoor23
AnyDoor [12]
controlcom23
ControlCom [85]
intrinsic_compositing23
Careaga et al. 2023 [11]
arshadowgan20
ARShadowGAN [46]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT

4. Qualitative comparison with lighting estimation methods

In the following, we show additional results on the evaluation dataset from lighting estimation methods, extending Fig. 4 from the paper. Instead of sorting images by PSNR, as done in the paper, here they are sampled randomly from the test set.
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT
Background
Target background
garon2019fast
Garon et al. 2019 [23]
karimidastjerdi2023everlight
Everlight [17]
zhan2021emlight
EMLight [84]
iv_v3_1.0_objdepth
ZeroComp InteriorVerse
op_v1_1.0_objdepth
ZeroComp OpenRooms (ours)
GT_emission_envmap
Simulated GT

5. Human perceptual study

As described in Sec. 5.3, a human perceptual study was conducted to compare 8 techniques (including ours). Here, we show the first page containing the instructions. After clicking "Start", the observer would be shown a page containing all 160 questions. The observer were not told but each pair of image contained one image generated with the ground truth lighting and one image generated using one of the 8 techniques (including ours).

Instructions page


Sample question

6. Ablation study on different predictors for the background intrinsics

Since we need to estimate the background intrinsics for the image compositing, we conducted an ablation study to compare the performance with different intrinsic predictors. In our method, we choose to use ZoeDepth for depth, StableNormal for normal, and Intrinsic Image Diffusion for albedo.

The following table shows the comparison of different predictors for the background intrinsics. No significant difference is observed between the different predictors, which suggests that ZeroComp is robust to the choice of the background intrinsic predictors.

Background Intrinsic Method PSNR RMSE si-RMSE SSIM MAE
Depth DepthAnythingV1 (Metric) 31.7 0.0303 0.0295 0.970 0.0109
DepthAnythingV2 (Metric) 31.8 0.0299 0.0292 0.970 0.0107
Metric3D 31.8 0.0299 0.0292 0.970 0.0108
Normal Metric3D 31.8 0.0302 0.0294 0.970 0.0109
OmniDataV2 31.8 0.0302 0.0294 0.970 0.0109
Albedo DFNet 31.7 0.0297 0.0290 0.970 0.0104

Depth
Normal
Albedo
ZeroComp (Ours)
ZoeDepth
StableNormal
Intrinsic Image Diffusion
31.7 0.0303 0.0295 0.970 0.0109

7. A video of our gradio demo

We provide a video of our gradio demo, which is a web-based interface for users to try our method. The demo allows users to upload their own background and object images, and config the compositing parameters. The demo will then generate the composited image using our method.

8. Failure cases

We identified the two most frequents failure cases of our method: limited specular reflections on non-Lambertian objects and incorrect shadow geometry.

Limited specular reflections on non-Lambertian objects

Our model trained on OpenRooms generates weak reflections on non-Lambertian objects, since the OpenRooms dataset doesn't include objects with diverse glossy or metallic materials. Furthermore, the metallic maps are not available in OpenRooms, leading to ambiguous results. Fortunately, we can improve specular reflections by training ZeroComp on a dataset with more non-Lambertian objects and more intrinsic maps, e.g., Hypersim or InteriorVerse.


ZeroComp OpenRooms (ours)

ZeroComp InteriorVerse

ZeroComp OpenRooms (ours)

ZeroComp InteriorVerse

Inaccurate shadow geometry

Since ZeroComp receives as input 2D instrinsic maps, the occluded geometry of the object is unknown to the network. As expected, in some cases, the missing geometry information leads to incorrect shadow shapes, as shown in the following examples.


ZeroComp OpenRooms (ours)

Ground truth

ZeroComp OpenRooms (ours)

Ground truth

ZeroComp OpenRooms (ours)

Ground truth

ZeroComp OpenRooms (ours)

Ground truth

9. Additional objects

Our evaluation set is composed mostly of furniture items inserted into various scenes. Here we show that our model also generates realistic composites with a diverse set of objects.


Target background

ZeroComp OpenRooms (ours)

Target background

ZeroComp OpenRooms (ours)

Target background

ZeroComp OpenRooms (ours)

Target background

ZeroComp OpenRooms (ours)

10. More real-world 2D compositing

Here, we extend Fig. 9 by showing additional examples of real 2D objects composited into real backgrounds. For each example row, we show the 2D object, the target background, and the predicted composite.
Background
2D object
Background
Target background
Background
ZeroComp OpenRooms (ours)
Background
2D object
Background
Target background
Background
ZeroComp OpenRooms (ours)
Background
2D object
Background
Target background
Background
ZeroComp OpenRooms (ours)
Background
2D object
Background
Target background
Background
ZeroComp OpenRooms (ours)