Published: Aug 06, 2008 - 06:39 pm
Story Found By: DavidWallace 1777 Days ago
Category: Searching
8 Comments
8 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:


Learn more about search marketing with our free online webcasts and webinars from our sister site, Digital Marketing Depot. Upcoming online events include:
Comments
Thanks for sphinning this, David.The thing I found most interesting about this patent was its description of different ways to compare search results from alternative versions of the queries, with and without stopwords, to see if those results were substantially similar. We know that Google stopped telling us about stopwords that it found in queries many months ago, yet its possible that a similarity analysis like that discribed in the patent may still be used in some instances.
Thanks for the patent update Bill, much appreciated.
How does Googles ignoring STOP words in their searches reconcile with Google giving the and in authority site status?
Good find. This is where the really fascinating part of search is, because one of the bigger hurdles to providing relevant results is dealing with the ambiguities of human language. Stuff like this gets right at the core of one of the biggest problems of computing: inferring semantics from syntax.
Hi John,I did try to answer your question on my blog since you posted it there too, and I saw it there first, but I dont know if my answer might be what youre looking for.Does your question have to do with the fact that searches for those words by themselves (the,in) show results that include sitelinks for the first results? Its odd that they do. Does the presence of site links make those sites authoritative for queries for those terms? A search for "at" also gives us site links for AT&T.Its an interesting question, worth exploring further.Does it make any difference that we are only searching for a single word, regardless of whether or not it might have been considered a stop word or not, in the past?
Thanks, Nick and ColinI agree with you about this kind of stuff being fascinating, Colin. How do you infer context and intent from a couple of words?
@billslawskiHi William,"Does your question have to do with the fact that searches for those words by themselves (the,in) show results that include sitelinks for the first results?"Yes. The #1 hit when searching on the turns up an authority site where the word actually means the. Whereas, in searching for in what turns up as #1 is the abbreviation for a state, in the United States.SEOs talk about authority site status rather than Google. Google just refers to sitelinks on which they have a patent.Authority site status, of which there are several versions, is basically a special form of a double or indented listing. All of which are desirable to have in the SERPs since it is widely believed that they attract the attention of searchers.Yes, the fact that I was searching for only one word might make a difference. But, since sitelinks and therefore authority sites are dynamically generated and if the is a STOP word then why is Google dynamically generating sitelinks for it?
Hi John,I tend to shy away from overusing a term to describe things if it might be confusing or misleading. A lot of people have used the term "authority" to describe web pages in many ways. What I think about when I hear the phrase is Jon Kleinbergs Authoritative pages - http://www.cs.cornell.edu/home/kleinber/auth.pdf Using the phrase "authority" to describe a site that shows up in search results with site links in response to a specific query may cloud things, as does using the term to describe a site that has an additional indented result in response to a query. While Im not sure that Google would have gone through the whole "substantially similar" comparison analysis that the stopwords patent describes on a search for a single stopword and no other terms, Im not sure that a search for "the" or "in" providing us with a top result with site links tells us anything about the stopwords process described in the patent, or the newer process with an enhanced compression/decompression approach that makes phrase searching more viable.I think looking closer at the site links process might yield more reasons for those terms to show specific pages with site links then considering how Google might be treating stop words or phrase matches. If the idea behind site links is to provide a better user experience in navigational queries by providing easier access to deeper pages in the first result, why would Googles algorithm decide to show site links for "The Onion" web site on a search for "the?" Since its the generation of site links that youre concerned about, in response to those query terms, that may be the first level of inquiry - why does that process get triggered for those words.