This is part 5 in a series of tutorials in which we learn how to build a content-based search engine that retrieves multimedia objects based on their content rather than based on keywords, title or meta description.

We have seen how we can compare two multimedia objects with respect to their similarity by computing the Earth Mover’s Distance on their feature signatures. To this date, the Earth Mover’s Distance has been shown experimentally to be the most effective distance measure on feature signatures, i.e. it captures the way that human beings judge the perceptual dissimilarity between the objects most adequately. However, we have also seen that the computational complexity of the Earth Mover’s Distance lies somewhere between $\mathcal{O}(n^3)$ and $\mathcal{O}(n^4)$, where $n$ is the total number of representatives of the two compared signatures. For large-scale multimedia retrieval applications, this becomes unfeasible.

An alternative distance measure, which is almost as effective as the Earth Mover’s Distance but significantly more efficient to compute, is the Signature Quadratic Form Distance (SQFD). It has been proposed in [BUS10b] and can be thought of as an adaption of the Quadratic Form Distance [FBF+94], a distance measure on histograms, to feature signatures.

Let X, Y be two feature signatures and let $s : \mathbb{F} \times \mathbb{F} \rightarrow \mathbb{R}^{\geq 0}$ be a similarity function on features. The Signature Quadratic Form Distance $SQFD_s : \mathbb{S} \times \mathbb{S} \rightarrow \mathbb{R}^{\geq 0}$ is defined as
$$SQFD_s(X, Y) = \sqrt{<X-Y, X-Y>_s}$$
where $<X, Y>_s : \mathbb{S} \times \mathbb{S} \rightarrow \mathbb{R}^{\geq 0}$ is the Similarity Correlation, which is defined as
$$<X, Y>_s = \sum_{f \in R_X} \sum_{g \in R_Y} X(f) Y(g) s(f, g)$$ 

Intuitively, the similarity correlation yields high values if representatives that are similar to each other also have high weights. If, on the other hand, the weight in one signature is distributed in distinct (i.e. dissimilar) regions of the feature space than for the other signature, it assigns low values. Therefore, $<X, Y>_s$ is a measure of the similarity of the two signatures.

$X-Y$ refers to the difference between the two feature signatures, which is defined as a new feature signature such that $(X-Y)(f) = X(f) – Y(f)$. The term $<X-Y, X-Y>_s$ yields the self-similarity of this difference . If $X = Y$, this value is 0. The more dissimilar $X$ and $Y$ are, the higher the distance value will be.

The previous definition pre-supposes some similarity function $s$ on feature vectors. Recall from Part 1: Quantifying Similarity that a similarity function assigns small values to objects that are dissimilar and larger values to objects that are more similar, reaching its maximum when the two compared objects are the same. Given some distance function $\delta$ on feature vectors (an obvious choice being the Euclidean Distance or the Manhattan Distance), we can define a similarity function by means of the Gaussian kernel:

$$s(f, g) = e^{-\frac{\delta(f, g)}{2\sigma^2}}$$

For a distance of 0, this similarity function yields a value of 1. For increasing distances, the similarity exponentially decays to 0. $\sigma$ controls the speed of this exponental decay, with larger values leading to a slower decay.

As we can see from the definition, the run-time complexity for computing the Signature Quadratic Form Distance is $\mathcal{O}(n^2)$, making it significantly more feasible than the Earth Mover’s Distance. Moreover, as opposed to the Earth Mover’s Distance, the computation of the sum can be fully parallelized on a GPU, effectively resulting in constant computation time for most signatures (see [KLBSS11]). Finally, it can be shown that the Signature Quadratic Form Distance is a metric, which will become important for indexing methods, as will be detailed in the next section.

While the Earth Mover’s Distance is slightly more effective than the Signature Quadratic Form Distance (see [BS13] for experimental comparisons), this difference may or may not be worth the increased computational complexity. The choice between the two distance metrics should therefore be based on the requirements of the project at hand.

We are now able to compute the similarity between arbitrary feature signatures. This effectively allows us to retrieve similar multimedia objects to a given query object. However, as of now, we would have to compute the distance between the query object and every single object in the database. If our database is large, this can lead to unacceptable computational demands and query times. For this reason, we are going to introduce indexing methods in the next section. These will allow us to retrieve similar objects to a query object without having to compute all distance values explicitly. Continue with the next section: Efficient Query Processing


[BUS10b] Christian Beecks, Merih Seran Uysal, and Thomas Seidl. Signature quadratic form distance. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 438–445. ACM, 2010.

[BS13] Christian Beecks and Thomas Seidl. Distance based similarity models for content based multimedia retrieval. PhD thesis, Aachen, 2013. Zsfassung in dt. und engl. Sprache; Aachen, Techn. Hochsch., Diss., 2013.

[KLBSS11] Kruliš, M., Lokoč, J., Beecks, C., Skopal, T., & Seidl, T. (2011, October). Processing the signature quadratic form distance on many-core gpu architectures. In: Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2373-2376). ACM.