- 37
- Sphinn It!
Posted By: aimClear 114 days ago
Topic Type: News Story (Jump to http://www.capecodseo.com)
Category: Google Other
6 Comments
6 Comments
Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today!
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008
Comments
The post you blocked. Does it have more than one URL that can access it? Did a different URL for it get indexed?
Did you block it in robots.txt a day or two before publishing it? If not, Google can find a page and index it, and only later, perhaps next day, get the newer version of robots.txt that was supposed to block it, and then take a day or two to drop it from the results.
That happens a lot.
Check the Google cache date, and then check your site logs for the date and time that Google fetched the robots.txt file.
g1smd - thanks for taking the time to comment and make suggestions. Its completely appreciated. I am pretty certain I uploaded the new robots file in coordination with the post - and have since simply removed the block.
I also tweaked a couple other things and we'll see what happens.
Your comments did get me thinking about checking Google's index of the site in greater detail and it does appear that I need to spend a bit more time on this.
Interestingly enough, more pages have now been removed (since this morning - including the sitemap page), although Google Blog Search has indexed all posts as appropriate.
The funny thing is - I don't even receive a substantial amount of Google traffic (now have I ever) but can't really NOT rank this blog for a select group of keywords. Sigh.
Thanks again for the time.
Google doesn't always fetch the robots.txt file before fetching new content, and even when they do fetch it they can take several days to act on it, and even longer to deindex stuff they have already listed. Nothing is instantaneous, and stuff that you thought secret can be easily leaked by doing things in the wrong order. Different search engies also do things in slightly different ways too. There's no easy answer. :-)
g1smd, that's what I was thinking too. Any time I hear "I blocked Google from one post, and then the root page disappeared" it often turns out to be related. I did a three-second check to verify that the webspam team had nothing to do with it.
@MattCutts, g1smd - thanks for the replies and the quick check. I didn't really think there was/is some form of conspiracy going on here - but it was fun to write about (sorry).
I also did not consider the relationship between search engine crawl rates, robot.txt files and timing. That is a really interesting point of reference and something to pay attention to in the future.
Thanks again!
I have learnt to upload a new robots.txt about a week before the URLs that are needed to be blocked go live.