Sphinn Home » Google SEO
Perhaps if the Googlebots segregated the news feed files and the regular web pages in separate databases, we would see less duplicated content issues that push web pages into the Supplemental Index.
4 Comments     

Comments

from AmyGreer 700 days ago #
Votes: 0 | Vote:
+ -

A well-reasoned post. Agreed that both pages and feeds serve human audiences, but in slightly different ways. So will it be the lesser of two evils: blocking feeds via robots.txt,which will certainly limit overall visibility against the potential decrease in visibility because of pages going supplemental?

from bwelford 700 days ago #
Votes: 0 | Vote:
+ -

There certainly is that trade-off analysis to be done. However if I'm understanding correctly, the RSS feed itself only recently has been included in the regular search. You still have the possibility that the regular post page will appear in the regular SERP. So it would be natural to have xml type files (rdf, etc.) appearing only in Blogsearch and the full posts (html or whatever) as the only ones appearing in the regular search.

from Sebastian 700 days ago #
Votes: 0 | Vote:
+ -

I've seen XML files like RSS feeds in Web search for a while now, probably 1-2 years. When an XML file comes up for a search usually the source had an optimization problem, was not (yet) indexed, or went supplemental. When XML files appeared on the SERPs they were handled as an "unknown file type", that means that the indexer just scored the textual contents ignoring the XML structure and the meaning of XML tags referring the source.

Some XML files have a weird PageRank just from the linked buttons, so it "makes sense" that they rank better than the source. Also, esp. with full feeds sometimes the XML's textual contents *are* more relevant than a single item one wants to see on the SERP, for example when the search query respectively its context spans more than the actual post.

I don't think that's a crawling problem, and it has nothing to do with the crawling engine's cache. The Web indexing process needs to "learn more" about XML and feed semantics;) And when a feed pops up in a raw result set, the query engine should be able to lookup the source (item) to provide the searcher with an indent result.

from bwelford 696 days ago #
Votes: 0 | Vote:
+ -

I've mentioned this Open Letter to Matt Cutts in the comments to his Minty Fresh Indexing post. Seems to me what's in which index is a critical question in these matters.


Log in to comment or register here.

Sphinn Sponsors

Be a Sphinn Sponsor - Click Here

Search Marketing Expo

Save the date for:
SMX Singapore - July 2-3, 2009
SMX São Paulo - August 4-5
SMX East - October 5-7, 2009
SMX Stockholm - 12-13 October, 2009
SMX Mexico - November 11, 2009

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts: