Topic Type: News Story (Jump to http://blog.cre8asite.net)
Category: Google SEO
4 Comments
4 Comments
Save the date for:
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX Sao Paolo - Brazil - (Aug. 7-8)
SMX China - September 23 & 24, 2008
SMX Stockholm - September 23 & 24, 2008
SMX East - NYC - (Oct. 6-8) Registration is now open.
SMX London - November 4 & 5, 2008
Comments
A well-reasoned post. Agreed that both pages and feeds serve human audiences, but in slightly different ways. So will it be the lesser of two evils: blocking feeds via robots.txt,which will certainly limit overall visibility against the potential decrease in visibility because of pages going supplemental?
There certainly is that trade-off analysis to be done. However if I'm understanding correctly, the RSS feed itself only recently has been included in the regular search. You still have the possibility that the regular post page will appear in the regular SERP. So it would be natural to have xml type files (rdf, etc.) appearing only in Blogsearch and the full posts (html or whatever) as the only ones appearing in the regular search.
I've seen XML files like RSS feeds in Web search for a while now, probably 1-2 years. When an XML file comes up for a search usually the source had an optimization problem, was not (yet) indexed, or went supplemental. When XML files appeared on the SERPs they were handled as an "unknown file type", that means that the indexer just scored the textual contents ignoring the XML structure and the meaning of XML tags referring the source.
Some XML files have a weird PageRank just from the linked buttons, so it "makes sense" that they rank better than the source. Also, esp. with full feeds sometimes the XML's textual contents *are* more relevant than a single item one wants to see on the SERP, for example when the search query respectively its context spans more than the actual post.
I don't think that's a crawling problem, and it has nothing to do with the crawling engine's cache. The Web indexing process needs to "learn more" about XML and feed semantics;) And when a feed pops up in a raw result set, the query engine should be able to lookup the source (item) to provide the searcher with an indent result.
I've mentioned this Open Letter to Matt Cutts in the comments to his Minty Fresh Indexing post. Seems to me what's in which index is a critical question in these matters.