Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

Perhaps if the Googlebots segregated the news feed files and the regular web pages in separate databases, we would see less duplicated content issues that push web pages into the Supplemental Index.
Comments4 Comments  

Comments

Avatar
from AmyGreer 3760 Days ago #
Votes: 0

A well-reasoned post. Agreed that both pages and feeds serve human audiences, but in slightly different ways. So will it be the lesser of two evils: blocking feeds via robots.txt,which will certainly limit overall visibility against the potential decrease in visibility because of pages going supplemental?

Avatar
from bwelford 3760 Days ago #
Votes: 0

There certainly is that trade-off analysis to be done. However if I’m understanding correctly, the RSS feed itself only recently has been included in the regular search. You still have the possibility that the regular post page will appear in the regular SERP. So it would be natural to have xml type files (rdf, etc.) appearing only in Blogsearch and the full posts (html or whatever) as the only ones appearing in the regular search.

Avatar Moderator
from Sebastian 3760 Days ago #
Votes: 0

I’ve seen XML files like RSS feeds in Web search for a while now, probably 1-2 years. When an XML file comes up for a search usually the source had an optimization problem, was not (yet) indexed, or went supplemental. When XML files appeared on the SERPs they were handled as an "unknown file type", that means that the indexer just scored the textual contents ignoring the XML structure and the meaning of XML tags referring the source. Some XML files have a weird PageRank just from the linked buttons, so it "makes sense" that they rank better than the source. Also, esp. with full feeds sometimes the XML’s textual contents *are* more relevant than a single item one wants to see on the SERP, for example when the search query respectively its context spans more than the actual post. I don’t think that’s a crawling problem, and it has nothing to do with the crawling engine’s cache. The Web indexing process needs to "learn more" about XML and feed semantics;) And when a feed pops up in a raw result set, the query engine should be able to lookup the source (item) to provide the searcher with an indent result.

Avatar
from bwelford 3757 Days ago #
Votes: 0

I’ve mentioned this Open Letter to Matt Cutts in the comments to his Minty Fresh Indexing post. Seems to me what’s in which index is a critical question in these matters.

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: