The Google Similarity Distance is a concept given by Rudi Cilibrasi and Paul Vitanyi which calculates the similarity between any number of given queries.
|Google | Most Powerful Search Engine|
By using the derived formula for Normalized Google Distance(NGD), given below, one can find the similarity between terms (0 for identical and 1 for unrelated).
max(log(f[x]),log(f[y])) - log(f[xy])
NGD = ------------------------------------
log(N) - min(log(f[x]),log(f[y]))
x - Query 1
y - Query 2
f[x] - Search Results Count of [Query 1]
f[y] - Search Results Count of [Query 2]
f[xy] - Search Results Count of [Query 1,Query2]
N - Total no. of pages searched by the search engine
This NGD when calculated can be used to draw a similarity graph of queries.
Google Similarity Distance is useful in Automated Machine Learning, Pattern Recognition, Clustering of Unknown Objects, etc.
Source : The Google Similarity Distance - IEEE
For full content : The Google Similarity Distance - PASCAL EPrints
Search Keywords: Google, Google Similarity Distance, Normalised Google Distance, Automated Machine Learning