Published: Jan 30, 2009 - 04:26 am
Story Found By: Gab 1576 Days ago
Category: SEO
If your test environment gets crawled and the pages get indexed you can suddenly have a duplicate of your entire site !!"
Great post, easy tips to implement!
11 Comments




Comments
Ive had that happen to me as well. Heres what I did for recovery and prevention:1) Get the correct robots.txt implemented, and make the file read-only so it doesnt accidentally get overwritten.2) Verify the dev site in Google Webmaster Tools.3) Request removal of the entire site, via GWT.4) Go to Code Monitor (https://www.polepositionweb.com/roi/codemonitor/) and have that site monitor the robots.txt file on the dev site so you know right away if it gets changed.5) Make sure you know about all of the dev servers in use, and do the same fore ach.
One of those tiny little simple things that can make all the difference!
This problem is quite common, particularly with sites that release often.
Haha! I included that in my list of 10 ways to piss off seos. Classic. http://www.alhankeser.com/10-ways-piss-off-seo/
Love using the entire paragraph as the anchor text.
Been there,done that, got gray hairs. We did all the things mogret suggests. It was a big site on two domains with multiple subdomains, so we had to go through this for every subdomain. Several times. Google kept unrecognizing the removal requests. What made it weird is that the original problem happened when access to the dev site was behind a password wall.
Lol yeah i think it is something everyone must of done at least once. The one time I did do it the stupid bloody dev server outranked site I was developing for!
It pays to have a good robots.txt file up!
@amabaie Another trrick is to do site:domain.com -inurl:www.domain.com to see what subdomains show up that youre not expecting.
@amabaie - thats weird! can you share the details with me? Google claims not to crawl through password fields. @Keri - nice tip!
The best way to do this is _not_ with robots.txt, I think. I dont do that because Im afraid it will get uploaded somehow. "Stuff" happens. We just put on a http password via .htaccess on the directory above all our development projects -- in this case "www/." That way we cant forget (or upload it by accident).You might say "just dont do that," but if that adage actually worked, you wouldnt forget to upload robots.txt to the development server in the first place, either. Mistakes happen, and that one could be *even more costly* than getting it dupe-indexed.This makes it so you cant forget to set it up. Google will not "hack" your web site, so unless you accidentally disable it in a particular directory, it will not spider anything with a http password on it.