Sphinn Home » SEO
Robots.txt has become a widely used method of controlling your site crawling. Thus it has become one of the first things I check when diagnosing on-site issues. While nowadays almost every webmaster knows its basics, some things still cause misunderstandings
11 Comments     

Comments

from tomcritchlow 356 days ago #
Votes: 1 | Vote:
+ -

Good post - I didn't know that they only looked at the more specific part. Handy to know.

from annie7 356 days ago #
Votes: 0 | Vote:
+ -

@tomcritchlow Wow, did I mention something you hadn't known? I am now proud of myself ;)

from g1smd 356 days ago #
Votes: 1 | Vote:
+ -

Two errors in item 4:

*** 4. To be on the safe side, you’d better have Robots.txt file even if you do not want to include any (specific) directions - let it be empty or default (user-agent: * allow:) in this case. ***


-- The two items need to be on separate lines, with at least one blank line after the last item.


-- Don't use "Allow:", use "Disallow:"

User-agent: *
Disallow:

That will disallow nothing (i.e. "allow" everything).



If anything, disallow your /robots.txt file, so that it isn't *indexed* and cannot appear in the SERPs.

from anthonyverre 356 days ago #
Votes: 0 | Vote:
+ -

Great Post.  I don't do extensive work with my robot.txt files, but this is certainly great base knowledge point that every SEO should read.

from Halfdeck 356 days ago #
Votes: 0 | Vote:
+ -

Googlebot's robots.txt handling is pretty intricate. Every SEO might try writing code that mimics Googlebot's parsing of robots.txt file as an exercise - till I've done that I never really understood how complicated things can get. When two directives conflict, for example, the directive Googlebot goes with may sometimes surprise you.

Anyway, if you want to make sure robots.txt is doing what it should, double check using Webmaster Tools robots.txt checker tool.


from annie7 356 days ago #
Votes: 0 | Vote:
+ -

@g1smd: corrected, thank you!

from oddseo 356 days ago #
Votes: 0 | Vote:
+ -

@g1smd >> is there anything like "allow"! lol

from yourseomentor 356 days ago #
Votes: 0 | Vote:
+ -

Wonderful find Jordan. Loved the post, full of great and useful content!

from yojpotter2 356 days ago #
Votes: 0 | Vote:
+ -

I haven't tried done this one before so I have no idea about it..^^ Thank you for the new info..^^

from g1smd 355 days ago #
Votes: 0 | Vote:
+ -


There is an "allow:" but it isn't really a part of the official spec, and it isn't recognised by many search engines.  http://www.robotstxt.org/robotstxt.html  -

As for which part Google uses, if there is a section for "Googlebot" then that is the *only* section that they will consider. Make sure that everything Google needs to see is in that section.

If there isn't a section for "Googlebot" then they will read the "User-agent:  *" section instead.   The suggestion to use WMT is most excellent. Check your file there.

Make sure there is a blank line between each block, and at least one blank line after the last record too.

from erik 355 days ago #
Votes: 0 | Vote:
+ -

In June, the big 3 "co-announced" an expansion of what they'll honor in robots.txt files, beyond the traditional (but weak) robotstxt.org boundaries. One of those is the "Allow" directive. See G, Y, M


Log in to comment or register here.

Sphinn Sponsors

Be a Sphinn Sponsor - Click Here

Search Marketing Expo

Save the date for:
SMX Singapore - July 2-3, 2009
SMX São Paulo - August 4-5
SMX East - October 5-7, 2009
SMX Stockholm - 12-13 October, 2009
SMX Mexico - November 11, 2009

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts: