The Google Similarity Distance is a concept given by Rudi Cilibrasi and Paul Vitanyi which calculates the similarity between any number of given queries.

Google | Most Powerful Search Engine |

By using the derived formula for Normalized Google Distance(NGD), given below, one can find the similarity between terms (0 for identical and 1 for unrelated).

max(log(f[x]),log(f[y])) - log(f[xy])

NGD = ------------------------------------

log(N) - min(log(f[x]),log(f[y]))

x - Query 1

y - Query 2

f[x] - Search Results Count of [Query 1]

f[y] - Search Results Count of [Query 2]

f[xy] - Search Results Count of [Query 1,Query2]

N - Total no. of pages searched by the search engine

This NGD when calculated can be used to draw a similarity graph of queries.

Google Similarity Distance is useful in Automated Machine Learning, Pattern Recognition, Clustering of Unknown Objects, etc.

Source : The Google Similarity Distance - IEEE

For full content : The Google Similarity Distance - PASCAL EPrints

Search Keywords: Google, Google Similarity Distance, Normalised Google Distance, Automated Machine Learning

## No comments:

## Post a Comment