Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

"A patent granted to Google today explores Web spam and the manipulation of documents and links on the Web. It describes how the rankings of pages may be influenced if they are identified as “manipulative.”
Comments3 Comments  

Comments

Avatar
from LilRascal 1537 Days ago #
Votes: 0

This seems like it conflicts with the "Spam Mass" patent that already exists. I believe it’s held by Yahoo?

Avatar
from billslawski 1537 Days ago #
Votes: 2

Thanks for sphinning this, steaprok.I’m not sure that there’s a large conflict with Yahoo’s paper, but if they use it, the mass estimation approach is likely only part of what they do.The patent describes a somewhat different approach than the Spam Detection Based on Mass Estimation paper proposes (http://infolab.stanford.edu/~zoltan/publications/gyongyi2006link.pdf).  But there do seem to be some similarities.  Instead of grouping the web graph into nodes (or perhaps specific hosts or domains), like the Yahoo paper suggests, the Google patent groups them into clusters in a few possible ways. We’re not given a lot of details in the patent on how that is done, exactly.  But we are provided with a few possible alternatives, including one that determines clusters by "computing a dense bipartite subgraph of articles comprising doorway articles and target articles, wherein the doorway articles contain links to the target articles" (more on bipartite subgraphs below).The Google approach described also seems like a broader overview, and looks at links and at content found on pages, while the Yahoo method is more focused, and seems to limit itself to link analysis.  That doesn’t mean that Yahoo wouldn’t also use some kind of content analysis, and they even suggest that "we conjecture that many false positives could be eliminated by complementary (textual) content analysis."I’m not sure that the Google patent provides enough detail on their clustering to really compare it to the Mass Estimation method.  The Google approach seems a little broader, and may be  influenced by the kind of thinking expressed in 1999’s "Mining the Link Structure of the World Wide Web" (http://citeseer.ist.psu.edu/213063.html) in a section which is in a box in the middle of the paper, and is titled "Trawling emerging cybercommunities automatically." Monica Henzinger (one of the inventors on the Google patent) refers to that section of that paper as providing an example of the bipartite subgraphs in a paper that she wrote - Challenges in Web Search (http://www.sigir.org/forum/F2002/henzinger.pdf). She tells us there that:Typically, link-spam sites have certain patterns of links that are easy to detect, but these patterns can mutate in much the same way as link spam detection techniques.  A less heuristic approach to discovering link spam is required.  One possibility is, as in the case of text spam, to use a more global analysis of the web instead of merely local page-level or site-level analysis.  For example, a cluster of sites that suddenlyt sprout thousands of new and interlinked webpages is a candidate link-spam site.  The work by Ravi Kumar et al. [16] on finding small bipartite clusters in the web is a first step in this direction.(The footnote leads the the "trawling" paper.)That paper, which she uses an example from, to write about bipartite clusters had a number of co-authors.  It looks like at least three or four of them may presently be at Yahoo. :)Google and Yahoo may not be doing the same things to combat Web spam, but the universe of search engineers at major search engines who work on spam issues is a pretty small one, and I’d think that we have to assume that while they may be using different approaches, they probably have some idea of what the folks at other search engines are doing to fight web spam.

Avatar
from steaprok 1537 Days ago #
Votes: 0

@ Bill , my pleasure. It is a fantasticly in depth post, chock full of info. 

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event:

Upcoming Webcasts

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include: