Published: Apr 13, 2009 - 01:28 am
Story Found By: Harith 1027 Days ago
Category: SEO
Graywolf Vs. Matt Cutts
Graywolf: umm @mattcutts care to offer an opinion on why a "diggbar" page is indexed despite the new noindex tag http://tinyurl.com/dku9nq
Matt Cutts: @graywolf, noindex *will* keep a url out of Google as long as we crawl the page to see the noindex tag. But cant honor a tag we didnt see.
Graywolf - of course real SEO experts who do stuff know that a noindex tag wont keep the URL out of google, just contents of the page
Matt Cutts: @graywolf, IIRC, noindex will *not* keep a url out of Yahoo/MSFT. More info: http://tinyurl.com/2evfo8
Graywolf: @mattcutts cmon Ive seen URLs with no index, nofollow and robots.txt blocking just show up as URLs
Matt Cutts: @graywolf, if a url is robots.txted out, then we would never crawl it to see the noindex tag. Thus uncrawled url can show up in that case.
Graywolf: @mattcutts we spoke about the problem recently and U suggested the experimental google robots code as an option to keep dev sites out
Matt Cutts: @graywolf, enough folks at Google are worried that people will shoot themselves in the foot that we dont espouse that solution recently.
Graywolf: @mattcutts I think thats a wrong solution but I do see googles point
Matt Cutts: @graywolf, if you robot.txt out a page, you can use url removal tool to remove all trace of a url, even the uncrawled urls.
Matt Cutts: @graywolf, or slap a password on the content via something like .htaccess. Theres definitely ways to avoid leaking urls.
Graywolf: @mattcutts but u gotta agree letting them crawl pages to keep them out of the index is even a teensy bit illogical
Graywolf: @mattcutts but that takes it out for 6 months not what I want to happen on a new domain
Matt Cutts: @graywolf, you can revoke a url removal and that takes place in ~24 hours. See "re-include content" at http://bit.ly/1094iw
Graywolf: @mattcutts but the 24H re-inclusion only works if you are verified in webmastercentral ... not everyone is ... or wants to be
Matt Cutts - @graywolf, fair enough, but that tool will work for vast majority of site owners. Rest can use e.g. a password with .htaccess.
Graywolf: @mattcutts but comon isnt it illogical that we have let you crawl URLs for them NOT to show at all?
Graywolf: @mattcutts its like you hopping the concertina wire to go check if the sign really says "no trespassing" ;-)
Matt Cutts: @graywolf, but we cant tell if a page has a noindex meta tag unless we fetch it. Off to finish laundry now.
Graywolf: @mattcutts but if you paid attention to the disallow tag you wouldnt be fetching it anyway ;-) off to make lasagna now
=========================
Graywolf Vs. g1smd
g1smd: @graywolf The robots.txt exclusion is do not spider URL; but URL may still appear as URL-only entry in SERPs.
g1smd: @graywolf The meta noindex exclusion is do not index this content and the page will not even appear as a URL-only entry in SERPs.
g1smd: @graywolf Using both robots.txt and meta will mean that page isnt accessed at all. The meta tag will not be seen, will get URL-only entry.
Graywolf: @g1smd yes thats exactly what happens ... and its wrong ;-)
g1smd: @graywolf @mattcutts I always use .htpasswd on dev server to keep everything out that I want to keep out. Both bots and nosey parkers alike.
g1smd: @graywolf @mattcutts For robots.txt disallow, Yahoo will list URL in SERP and create Title for it using anchor text from an incoming link.
Graywolf: @g1smd agreed but at some point you want to open it up for non public beta and remove the PWD thats when Goog starts w URL only
==========================
Graywolf Vs. Halfdeck
Halfdeck: @graywolf robots disallow doesnt say "dont index this." It tells Google "dont crawl."
Graywolf: @Halfdeck SE used to not index URLs they didnt crawl but google changed it to compensate for webmaster shooting self in foot
Halfdeck: @graywolf its also good biz. like matt always says, if wsj.com disallowed their home page Googled still wanna show it in the SERPs.
Graywolf: @Halfdeck I disagree no is no ... not you try to figure out if really meant no or yes #microsoftclippylogic
Halfdeck: @graywolf Google does listen to both disallow/NOINDEX directives. Just their definition and ours dont jive, mostly cuz were bad listeners.
Graywolf: @Halfdeck its more cause some webmaster shot themselves in the foot and google tries to compensate to help them
Graywolf: @Halfdeck but yes definitions do not jive
4 Comments


Comments
You missed my best tweet http://twitter.com/Halfdeck/status/1498218245 :)
Its clear that robots cant read a noindex directive if they are disallowed to access the page, but the robots.txt disallow itself implies that the webmaster does not want the site (or parts of it) to appear in the database/SERPs.Not even with a URL. So showing a URL in the results regardless is counter-intuitive for webmasters "... and its wrong".
Pretty interesting discussion. <div></div><div>Lol @halfdeck</div>
Related discussions: http://sphinn.com/story/109580