Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality

Background

multi Local Intrinsic Dimensionality(multiLID) has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks.

DIRE rely on a vast amount of data to be trained on.

most research are not proven to be able to distinguish different DM generated images within the same context.

Detectors for Synthetic Images:

GAN: frequency domain, Self-Blended Images DM: DIRE

Method

use extracted lower dimensional and structured feature maps instead of the raw image, low dimensional manifold hypothesis(related to non-linear dimensionality reduction?, I don’t know much about this hypothesis, may study it later):
natural images appear to conform to a low-dimensional structure (i.e. low intrinsic dimensions) as the probability distribution of images is highly concentrated

LID:

a method used to estimate the intrinsic dimensionality of a learned representation space. LID measures the average distance between a point and its neighboring points

multiLID:

Calculate a feature vector (i.e. the multiLID) instead of computing an aggregated (semi) local ID.
multiLID aim to capture more fine-grained information about the relative growth rates at different distances for each sample.
benefit: more fine-grained information

Exmaple

let $k=10$, for a sample image, we extract $8$ feature maps, then the length of multiLID vecvtor is $8 \times 10 = 80$.
for the raw LID, we may use eq(3) to compute, the vector length mey be $8$, the distance of each feature map has been aggregated.

Evaluation/Experments

The ResNet is untrained, and a difference in the detector’s accuracy by using untrained or trained weights is not observed (appendix C)

Reading Summary

What is the contribution/novelty?

The proposed method needs a small train set to train the CNN that used to extract features (i.e. the ResNet in the paper).
can distinguish images generated by different diffusion model

What is the existing issue?

difficulty in reliably differentiating between GAN-generated and DM-generated images (Fig.5)
transferability is low (between different DM models), which means the detector’s applicability to unseen synthetics is low