Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

It appears that within the past 2-3 days the popular social book-marking site del.icio.us has started blocking the major search engine spiders from crawling their site.

This isn’t a simple robots.txt exclusion either, but rather a 404 response that is now being served based on the requesting User-Agent.

This raises some questions as to the intentions of del.icio.us, and perhaps Yahoo! With Yahoo! recently integrating del.icio.us bookmarks into its search results this could an attempt to enhance the effectiveness of that new feature by preventing competing search engines from indexing content from del.icio.us.
Comments15 Comments  

Comments

Avatar
from sajal 2357 Days ago #
Votes: 0

hmm.. del.icio.us already nofollows the external links...shows that Yahoo is seriously considering Microsofts bid. To keep the buyer happy, Yahoo needs to proove that they can also be evil!

Avatar Moderator
from Sebastian 2357 Days ago #
Votes: 0

That’s not new. Surfing del.ico.us with a crawler user name was a PITA for ages. Did you check it from a SE’s IP? I found a few pretty recent cached page copies at Google.

Avatar
from ColinCochrane 2356 Days ago #
Votes: 0

Sebastian,I didn’t check from a SE IP, no.  I did hunt through a few hundred pages from a site command on del.icio.us and didn’t find any pages with a cache date newer than February 13th.I don’t see why that they would disallow the User-Agents via robots.txt, and send 404 resonses to requests from that User-Agent, but allow those spiders based on IP, though.

Avatar
from Eavesy 2356 Days ago #
Votes: 0

I wondered why my recently created profile was not showing up in the engines, they used to rank v-high.

Avatar
from mvandemar 2355 Days ago #
Votes: 1

I’m pretty sure Sebastian is right. If it were blocking them, then wouldn’t that also disallow the Google translation bot?http://del.icio.us/mvandemar (in Russian)The robots.txt isn’t actually blocking the spiders. It is in the same format it has been in since at least December:robots.txt (from December 24th)I don’t see why that they would disallow the User-Agents via robots.txt, and send 404 resonses to requests from that User-Agent, but allow those spiders based on IP, though.If you look, there is an "Allow: /" after each of the bot identifiers, which would override the Disallow at the top of the file. The stuff that they are disallowing is stuff they have always disallowed, things that don’t need to get indexed, like inboxes and searches.

Avatar
from ColinCochrane 2355 Days ago #
Votes: 0

As far as I know the Google translation bot isn’t used for indexing.  If the goal here is to simply prevent competitors from indexing content on del.icio.us then blocking the translation bot wouldn’t seem to be a priority.

Avatar
from mvandemar 2355 Days ago #
Votes: 1

They still didn’t block anything important with robots.txt... all of the stuff that matters is available. They aren’t blocking anything new.

Avatar
from ColinCochrane 2355 Days ago #
Votes: 0

The issue here isn’t the robots.txt.  To quote the my initial post:"This isn’t a simple robots.txt exclusion either, but rather a 404 response that is now being served based on the requesting User-Agent."

Avatar
from mvandemar 2355 Days ago #
Votes: 0

Colin, Sebastian suggested that they were only blocking people form non-search engine IP blocks who spoofed the user agents, you replied that it made no sense to allow from those IP’s if they were blocked by robots.txt. I point out that the robots.txt isn’t blocking the pages you were concerned with, and you say it’s not about the robots.txt.You’re going in circles now. I mean, why would they allow them by robots.txt, but then block them just by user agent?

Avatar
from IncrediBILL 2355 Days ago #
Votes: 0

Did you see http://del.icio.us/robots.txt? Just trying to access their site with user agents for those top 4 bots is MEANINGLESS because people serious about their web crawling security, which they appear to be, block based on IP range and/or full trip DNS validation.Therefore, you can never tell with any test, no matter how hard you try, what that site is actually telling the real Googlebot unless you have an actual Googlebot IP address in the range of an actual Googlebot crawler that has a reverse DNS that responds with "crawl-xx-xxx-xx-xxx.googlebot.com"However, you are correct that the latest cache date is a few days old, but it’s way too early to tell what’s really going on at this point.

Avatar
from ColinCochrane 2355 Days ago #
Votes: 0

My mistake Micheal, I got a bit mixed up.  The discrepancy between the robots.txt and the User-Agent block is what I was trying to highlight.Sebastian made a comment on my blog that bears repeating: "If in a week or so we can’t find crawler fetches after Feb/13 that’s worth further investigation."It will be interesting to see if there are any updated caches over the next fews days.

Avatar
from DanThies 2355 Days ago #
Votes: 0

They’ve been doing some different stuff to deal with bad bots for quite a while:http://seoblog.intrapromote.com/2006/08/delicious_cloak.htmlI doubt that this is anything but a different approach to what is essentially the same set of problems.

Avatar
from DanThies 2355 Days ago #
Votes: 0

del.icio.usA social bookmarks manager. Using bookmarklets, you can add bookmarks to your list and categorize them.del.icio.us/ - 22k - 8 hours ago - Cached - Similar pages

Avatar
from AboutBruyns 2355 Days ago #
Votes: 0

To bad if so, because it really helped ranking sites.

Avatar
from AccuraCast 1845 Days ago #
Votes: 0

As already pointed out, del.icio.us is NOT blocking the Googlebot from indexing any user pages / pages worth being indexed. They’re only blocking spider spoofers and admin type pages.

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: