- 58
- Sphinn It!
Posted By: UtahSEOpro 46 days ago
Topic Type: News Story (Jump to http://www.searchenginejournal.com)
Category: SEO
11 Comments
11 Comments
Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct.
6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.
Comments
Good post - I didn't know that they only looked at the more specific part. Handy to know.
@tomcritchlow Wow, did I mention something you hadn't known? I am now proud of myself ;)
Two errors in item 4:
*** 4. To be on the safe side, you’d better have Robots.txt file even if you do not want to include any (specific) directions - let it be empty or default (user-agent: * allow:) in this case. ***
-- The two items need to be on separate lines, with at least one blank line after the last item.
-- Don't use "Allow:", use "Disallow:"
User-agent: *
Disallow:
That will disallow nothing (i.e. "allow" everything).
If anything, disallow your /robots.txt file, so that it isn't *indexed* and cannot appear in the SERPs.
Great Post. I don't do extensive work with my robot.txt files, but this is certainly great base knowledge point that every SEO should read.
Googlebot's robots.txt handling is pretty intricate. Every SEO might try writing code that mimics Googlebot's parsing of robots.txt file as an exercise - till I've done that I never really understood how complicated things can get. When two directives conflict, for example, the directive Googlebot goes with may sometimes surprise you.
Anyway, if you want to make sure robots.txt is doing what it should, double check using Webmaster Tools robots.txt checker tool.
@g1smd: corrected, thank you!
@g1smd >> is there anything like "allow"! lol
Wonderful find Jordan. Loved the post, full of great and useful content!
I haven't tried done this one before so I have no idea about it..^^ Thank you for the new info..^^
There is an "allow:" but it isn't really a part of the official spec, and it isn't recognised by many search engines. http://www.robotstxt.org/robotstxt.html -
As for which part Google uses, if there is a section for "Googlebot" then that is the *only* section that they will consider. Make sure that everything Google needs to see is in that section.
If there isn't a section for "Googlebot" then they will read the "User-agent: *" section instead. The suggestion to use WMT is most excellent. Check your file there.
Make sure there is a blank line between each block, and at least one blank line after the last record too.
In June, the big 3 "co-announced" an expansion of what they'll honor in robots.txt files, beyond the traditional (but weak) robotstxt.org boundaries. One of those is the "Allow" directive. See G, Y, M