- 13
- Sphinn It!
Topic Type: News Story (Jump to http://piloseo.com)
Category: Google SEO
6 Comments
6 Comments
Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct.
6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.
Comments
You say the pages that did not exist were indexed and showed in the SERPs.
Did those entries show as URL-only entries, or as full entries with a title and snippet?
They did not show in the SERPs but they showed up when I ran a site:domain search operator. They showed up with the exact title and description as they appear on the site that actually has those URL's as well as the URL extension. The only difference was that they had the root domain of the site that did not have those URL's.
For example
This is how the data appears for the site that actually has the page
Title: New Product: Site1.com
Description: This page offers New Product.
URL: www.site1.com/new-product
This is what showed up in the site search for the site where those pages did not exist
Title: New Product: Site1.com
Description: This page offers New Product.
URL: www.site2.com/new-product
Everything was the same except the root domain. I have followed up with everyone and these pages never existed and GWT shows they all produced 404 errors. In fact now only one of these pages shows up and it obviously does not work.
I have written a few post on avoding duplicate content using an .htaccess. It was targeted specifically at wordpress, but the same concepts apply.
Avoid Duplicate Content with Wordpress
Yeah, the thing is this was not duplicate content. The pages only existed on one site. It looks like Google found that the domains had very similar names and decided to crawl known pages to see if there was duplicate content. It got hit with 404 errors because the pages did not exist but for some reason indexed them (for at least one day). From my point of view it seems like Googlebot has some sort of process that commands it to purposefuly go out and try to find duplicate content on sites that closely resemble each other in terms of domain name.
It seems you get close to an answer over there:
http://groups.google.com/group/Google_Webmaster_Help-Indexing/browse_thread/thread/ae15f7d63c60721a/
;)
I bet this has nothing to do with dupe fishing.
Thanks for the assistance Sebastian. Like I said over there, this just gives me more ammunition to fight the developer on this front.