Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

I have followed today (11th April 2009) a very interesting informative discussion on Twitter between Graywolf and Matt Cutts. During the said discussion g1smd and Halfdeck “puped up” and Graywolf had enough energy to discuss the subject with them too. Enjoy!

Graywolf Vs. Matt Cutts

Graywolf: umm @mattcutts care to offer an opinion on why a "diggbar" page is indexed despite the new noindex tag http://tinyurl.com/dku9nq

Matt Cutts: @graywolf, noindex *will* keep a url out of Google as long as we crawl the page to see the noindex tag. But can’t honor a tag we didn’t see.

Graywolf - of course real SEO experts who do stuff know that a noindex tag won’t keep the URL out of google, just contents of the page

Matt Cutts: @graywolf, IIRC, noindex will *not* keep a url out of Yahoo/MSFT. More info: http://tinyurl.com/2evfo8

Graywolf: @mattcutts cmon I’ve seen URL’s with no index, nofollow and robots.txt blocking just show up as URL’s

Matt Cutts: @graywolf, if a url is robots.txt’ed out, then we would never crawl it to see the noindex tag. Thus uncrawled url can show up in that case.

Graywolf: @mattcutts we spoke about the problem recently and U suggested the experimental google robots code as an option to keep dev sites out

Matt Cutts: @graywolf, enough folks at Google are worried that people will shoot themselves in the foot that we don’t espouse that solution recently.

Graywolf: @mattcutts I think thats a wrong solution but I do see google’s point

Matt Cutts: @graywolf, if you robot.txt out a page, you can use url removal tool to remove all trace of a url, even the uncrawled urls.

Matt Cutts: @graywolf, or slap a password on the content via something like .htaccess. There’s definitely ways to avoid leaking urls.

Graywolf: @mattcutts but u gotta agree letting them crawl pages to keep them out of the index is even a teensy bit illogical

Graywolf: @mattcutts but that takes it out for 6 months not what I want to happen on a new domain

Matt Cutts: @graywolf, you can revoke a url removal and that takes place in ~24 hours. See "re-include content" at http://bit.ly/1094iw

Graywolf: @mattcutts but the 24H re-inclusion only works if you are verified in webmastercentral ... not everyone is ... or wants to be

Matt Cutts - @graywolf, fair enough, but that tool will work for vast majority of site owners. Rest can use e.g. a password with .htaccess.

Graywolf: @mattcutts but comon isn’t it illogical that we have let you crawl URL’s for them NOT to show at all?

Graywolf: @mattcutts it’s like you hopping the concertina wire to go check if the sign really says "no trespassing" ;-)

Matt Cutts: @graywolf, but we can’t tell if a page has a noindex meta tag unless we fetch it. Off to finish laundry now.

Graywolf: @mattcutts but if you paid attention to the disallow tag you wouldn’t be fetching it anyway ;-) off to make lasagna now

=========================

Graywolf Vs. g1smd

g1smd: @graywolf The robots.txt exclusion is ’do not spider URL’; but URL may still appear as URL-only entry in SERPs.

g1smd: @graywolf The meta noindex exclusion is ’do not index this content’ and the page will not even appear as a URL-only entry in SERPs.

g1smd: @graywolf Using both robots.txt and meta will mean that page isn’t accessed at all. The meta tag will not be seen, will get URL-only entry.

Graywolf: @g1smd yes thats exactly what happens ... and it’s wrong ;-)

g1smd: @graywolf @mattcutts I always use .htpasswd on dev server to keep everything out that I want to keep out. Both bots and nosey parkers alike.

g1smd: @graywolf @mattcutts For robots.txt disallow, Yahoo will list URL in SERP and create Title for it using anchor text from an incoming link.

Graywolf: @g1smd agreed but at some point you want to open it up for non public beta and remove the PWD that’s when Goog starts w URL only

==========================

Graywolf Vs. Halfdeck

Halfdeck: @graywolf robots disallow doesn’t say "don’t index this." It tells Google "don’t crawl."

Graywolf: @Halfdeck SE used to not index URL’s they didn’t crawl but google changed it to compensate for webmaster shooting self in foot

Halfdeck: @graywolf its also good biz. like matt always says, if wsj.com disallowed their home page Google’d still wanna show it in the SERPs.

Graywolf: @Halfdeck I disagree no is no ... not you try to figure out if really meant no or yes #microsoftclippylogic

Halfdeck: @graywolf Google does listen to both disallow/NOINDEX directives. Just their definition and ours don’t jive, mostly cuz we’re bad listeners.

Graywolf: @Halfdeck it’s more cause some webmaster shot themselves in the foot and google tries to compensate to help them
Graywolf: @Halfdeck but yes definitions do not jive
Comments4 Comments  

Comments

Avatar
from Halfdeck 1027 Days ago #
Votes: 4

You missed my best tweet http://twitter.com/Halfdeck/status/1498218245 :)

Avatar
from sza 1025 Days ago #
Votes: 0

It’s clear that robots can’t read a noindex directive if they are disallowed to access the page, but the robots.txt disallow itself implies that the webmaster does not want the site (or parts of it) to appear in the database/SERPs.Not even with a URL. So showing a URL in the results regardless is counter-intuitive for webmasters "... and it’s wrong".

Avatar
from hendricius 1024 Days ago #
Votes: 1

Pretty interesting discussion. <div></div><div>Lol @halfdeck</div>

Avatar
from g1smd 1021 Days ago #
Votes: 0

Related discussions: http://sphinn.com/story/109580

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event:

Upcoming Webcasts

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include: