If your test environment gets crawled and the…" />

Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

"Nothing is more humiliating than leaving your development server accessible to the robots.

If your test environment gets crawled and the pages get indexed you can suddenly have a duplicate of your entire site !!"

Great post, easy tips to implement!
Comments11 Comments  

Comments

Avatar Moderator
from kerimorgret 1907 Days ago #
Votes: 0

I’ve had that happen to me as well. Here’s what I did for recovery and prevention:1) Get the correct robots.txt implemented, and make the file read-only so it doesn’t accidentally get overwritten.2) Verify the dev site in Google Webmaster Tools.3) Request removal of the entire site, via GWT.4) Go to Code Monitor (https://www.polepositionweb.com/roi/codemonitor/) and have that site monitor the robots.txt file on the dev site so you know right away if it gets changed.5) Make sure you know about all of the dev servers in use, and do the same fore ach.

Avatar
from hjortur 1907 Days ago #
Votes: 0

One of those tiny little simple things that can make all the difference!

Avatar
from andrewsho 1907 Days ago #
Votes: 0

This problem is quite common, particularly with sites that release often. 

Avatar
from AlhanKeser 1907 Days ago #
Votes: 0

Haha! I included that in my list of 10 ways to piss off seo’s. Classic. http://www.alhankeser.com/10-ways-piss-off-seo/

Avatar
from andrewsho 1907 Days ago #
Votes: 0

Love using the entire paragraph as the anchor text.

Avatar
from amabaie 1907 Days ago #
Votes: 0

Been there,done that, got gray hairs.  We did all the things mogret suggests.  It was a big site on two domains with multiple subdomains, so we had to go through this for every subdomain.  Several times.  Google kept unrecognizing the removal requests.  What made it weird is that the original problem happened when access to the dev site was behind a password wall.

Avatar
from Yossarian 1906 Days ago #
Votes: 0

Lol yeah i think it is something everyone must of done at least once. The one time I did do it the stupid bloody dev server outranked site I was developing for!

Avatar
from HeadlandDigital 1906 Days ago #
Votes: 0

It pays to have a good robots.txt file up!

Avatar Moderator
from kerimorgret 1906 Days ago #
Votes: 0

@amabaie Another trrick is to do site:domain.com -inurl:www.domain.com to see what subdomains show up that you’re not expecting.

Avatar
from Gab 1904 Days ago #
Votes: 0

@amabaie - that’s weird! can you share the details with me? Google claims not to crawl through password fields. @Keri - nice tip!

Avatar
from seoegghead 1903 Days ago #
Votes: 0

The best way to do this is _not_ with robots.txt, I think.  I don’t do that because I’m afraid it will get uploaded somehow.  "Stuff" happens.  We just put on a http password via .htaccess on the directory above all our development projects -- in this case "www/."  That way we can’t forget (or upload it by accident).You might say "just don’t do that," but if that adage actually worked, you wouldn’t forget to upload robots.txt to the development server in the first place, either.  Mistakes happen, and that one could be *even more costly* than getting it dupe-indexed.This makes it so you can’t forget to set it up.  Google will not "hack" your web site, so unless you accidentally disable it in a particular directory, it will not spider anything with a http password on it.

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: