- 38
- Sphinn It!
Posted By: steaprok 318 days ago
Topic Type: News Story (Jump to http://www.seobythesea.com)
Category: Google
3 Comments
3 Comments
Save the date for:
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.
SMX West - Feb. 10-12
Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.
Comments
This seems like it conflicts with the "Spam Mass" patent that already exists. I believe it's held by Yahoo?
Thanks for sphinning this, steaprok.
I'm not sure that there's a large conflict with Yahoo's paper, but if they use it, the mass estimation approach is likely only part of what they do.
The patent describes a somewhat different approach than the Spam Detection Based on Mass Estimation paper proposes (http://infolab.stanford.edu/~zoltan/publications/gyongyi2006link.pdf). But there do seem to be some similarities. Instead of grouping the web graph into nodes (or perhaps specific hosts or domains), like the Yahoo paper suggests, the Google patent groups them into clusters in a few possible ways.
We're not given a lot of details in the patent on how that is done, exactly. But we are provided with a few possible alternatives, including one that determines clusters by "computing a dense bipartite subgraph of articles comprising doorway articles and target articles, wherein the doorway articles contain links to the target articles" (more on bipartite subgraphs below).
The Google approach described also seems like a broader overview, and looks at links and at content found on pages, while the Yahoo method is more focused, and seems to limit itself to link analysis. That doesn't mean that Yahoo wouldn't also use some kind of content analysis, and they even suggest that "we conjecture that many false positives could be eliminated by complementary (textual) content analysis."
I'm not sure that the Google patent provides enough detail on their clustering to really compare it to the Mass Estimation method. The Google approach seems a little broader, and may be influenced by the kind of thinking expressed in 1999's "Mining the Link Structure of the World Wide Web" (http://citeseer.ist.psu.edu/213063.html) in a section which is in a box in the middle of the paper, and is titled "Trawling emerging cybercommunities automatically."
Monica Henzinger (one of the inventors on the Google patent) refers to that section of that paper as providing an example of the bipartite subgraphs in a paper that she wrote - Challenges in Web Search (http://www.sigir.org/forum/F2002/henzinger.pdf).
She tells us there that:
(The footnote leads the the "trawling" paper.)
That paper, which she uses an example from, to write about bipartite clusters had a number of co-authors. It looks like at least three or four of them may presently be at Yahoo. :)
Google and Yahoo may not be doing the same things to combat Web spam, but the universe of search engineers at major search engines who work on spam issues is a pretty small one, and I'd think that we have to assume that while they may be using different approaches, they probably have some idea of what the folks at other search engines are doing to fight web spam.
@ Bill , my pleasure. It is a fantasticly in depth post, chock full of info.