Thursday, October 21, 2004

More on Google behind the scenes

Jeff Dean from Google gave a great talk today at University of Washington.

Much of the early part of the talk focused on their use of commodity hardware and their cluster, much of which is covered in the Google Cluster Architecture and Google File System papers.

Jeff spent some time talking about MapReduce -- a custom programming model used at Google for rapid develop of robust parallel applications -- and pointed to his upcoming MapReduce paper at OSDI.

He also demoed Google's new word clustering work, which can find related words given a word or phrase. For example, for "rolling hash", it found words related to pot, but for "rolling hash function", it came up with "MD5" and other one-way hash functions. It was also able to find synonyms like "cuisine" for "cooking", though non-synonyms but still related words like "food" also showed up high in the clusters. Jeff said Google will use this data to improve relevance rank and help find search results that are clearly relevant to your search query but don't exactly match your search terms. This demo looked similar to what I heard of Peter Norvig's talk at Web 2.0.

This word clustering is great example of what you can do with massive amounts of data and processing power. We did a lot of similar things at Amazon.com.

No comments: