Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

The interview appears to have taken place on September 24th, but was published today.

Covers following of links (mentions javascript being followed)
Robots.txt (pages blocked can be returned in search results, and can accrue PageRank)
Noindex accrues and passes PageRank

Plus lots more (yep going back to reading it now, much more interesting than paid links and PageRank drops)
Comments31 Comments  

Comments

Avatar
from randfish 2603 Days ago #
Votes: 2

This is one of the very best interviews with Matt that I’ve ever seen. Bravo! Eric. This should be a must-read for everyone in search marketing.

Avatar
from planetc1 2603 Days ago #
Votes: 0

And a nice photo of Matt in a collared shirt, don’t think I’ve ever seen that before.

Avatar
from Harith 2603 Days ago #
Votes: 0

IMO, here is the most interesting part of the interview :-)-----------------------------------------------Eric Enge: Can a NoIndex page accumulate PageRank? Matt Cutts: A NoIndex page can accumulate PageRank, because the links are still followed outwards from a NoIndex page. Eric Enge: So, it can accumulate and pass PageRank. Matt Cutts: Right, and it will still accumulate PageRank, but it won’t be showing in our Index. So, I wouldn’t make a NoIndex page that itself is a dead end. You can make a NoIndex page that has links to lots of other pages. For example you might want to have a master Sitemap page and for whatever reason NoIndex that, but then have links to all your sub Sitemaps.-------------------------------------------------I might ask Matt to be kind to eleborate more about above. And who knows, he might do that even without me asking him ;-)  

Avatar
from Wiep 2602 Days ago #
Votes: 0

Danny, how about a double Sphinn function for stuff like this? :)

Avatar
from AndyBeard 2602 Days ago #
Votes: 0

One of the things that isn’t covered about Noindex pages - Whilst they are followed do they pass anchor text or proximity to keywords with their juice?I keep duplicate content pages indexed, because they often form a bridge in topical relationships, and I would hate to lose that relevance, if it makes a difference.

Avatar
from MattCutts 2602 Days ago #
Votes: 1

planetc1, sharp eyes! I’ve been around Google long enough that we all lost my photograph that we used for PR purposes. So a little while ago they took a new PR photo of me. 10 minutes after the photo was taken, I was back in a T-shirt. :)Harith, I’ll try to elaborate. I believe NoIndex pages (and robot.txt’ed out pages) can still accrue PageRank just fine. So even though Google doesn’t show NoIndex pages, they can still pass PageRank along their outgoing links.Now in the interview when I mentioned "a master Sitemap page," I was talking about an HTML sitemap page. For example, suppose you’ve been blogging for four years, 400 posts a year, and now you’ve got 1600 posts. Suppose you want to make an HTML sitemap that leads to all your posts. 1600 links on one page might not look great, so suppose you decided to make one master HTML sitemap file that links to four other sitemaps -- one for each year you blogged. If you wanted to, I believe you could make the master HTML sitemap page a NoIndex page. The master HTML sitemap page wouldn’t show up at all in Google’s search results, but PageRank would still flow to the four sitemaps pages (one for each year).It was a bit of an arbitrary example, but it gives you a flavor of how you could sculpt the flow of PageRank within your site. Definitely feel free to check me; it’s not a hard experiment to verify this.Does that help?

Avatar
from Harith 2602 Days ago #
Votes: 0

Matt,Thank you for your time. And sorry for keeping you awake. Wish you a great week and a nice trip to Kirkland on Tuesday ;-)

Avatar
from Wiep 2602 Days ago #
Votes: 0

Matt,"I believe NoIndex pages (and robot.txt’ed out pages) can still accrue PageRank just fine. So even though Google doesn’t show NoIndex pages, they can still pass PageRank along their outgoing links."Just to be sure; you’re saying that both noindexed pages and robots.txt’ed out pages are able to pass PR?

Avatar
from g1smd 2602 Days ago #
Votes: 0

It has been apparent for years that pages with [meta  name="robots"  content="noindex"] are still spidered and cached, but not displayed in search results.  If they are cached, then the link data would still be used for something.This is, however, somewhat counter-intuitive in fighting some types of Duplicate Content issues where you would want to "hide" some URL formats from being spidered and cached. I believe that is is also a problem where you might want to completely hide parts of a site (such as all the pages on a forum that return "error. you are not logged in." and nothing else). Imagine a bug where Google assigned more PR to URLs that were not being shown in the index because they had been blocked,  than for the URLs for the site that did show in the SERPs. There is more to this Canonical URL business than meets the eye. This latest disclosure changes a few things. The dated searches provide a useful insight into how Google sees a site, and how it picks up lots of URL formats for the same content:http://www.google.com/search?num=100&filter=0&q=site:resource-zone.com&as_qdr=d4In some cases the content has already been found under the URL format that is not wanted to show in the SERPs (hence the URL-only entry, as the format is disallowed by robots.txt) and has not yet been found under the format that is supposed to be used as the canonical URL.It seems that even with robots.txt it still isn’t possible to herd the bot to the right URLs. It wants to look at everything and then make its own mind up. Redirects obviously work well, but  there are issues with using the robots "disallow" directive in some cases.

Avatar
from Halfdeck 2602 Days ago #
Votes: 1

If robots.txt disallow and META robots noindex doesn’t "block" PageRank to disallowed/noindexed pages (which is what I think Matt is saying), then the only way I see to prevent PageRank from passing to an internal URL is vial rel=nofollow or not linking to a URL at all.BTW, thanks to both Eric and Matt for an awesome interview.

Avatar
from g1smd 2602 Days ago #
Votes: 2

*** you’re saying that both noindexed pages and robots.txt’ed out pages are able to pass PR? ***No.Pages disallowed by meta robots noindex are spidered and cached and can pass PR.  The meta robots noindex simply means that the page will not  appear anywhere in the SERPs.Pages disallowed by meta robots noindex are spidered and cached, but if there is also a meta robots nofollow tag or a rel=nofollow  attribute present then they do not pass PR to other pages.Pages disallowed by robots.txt aren’t spidered at all.  They can still appear as URL-only entries in the SERPs. They can accumulate PR. They cannot pass any PR, because the serach-engine will never see what the page links out to, because the page will never be spidered.

Avatar
from AndyBeard 2602 Days ago #
Votes: 0

Just a little something in practiceI have a whole load of affiliate links that redirect through pages in a directory that is blocked by Robots.txtThose are effectively a black hole accumulating PR.What I should do is use Noindex Follow, and have a link back to the originating domain, plus a 302 redirect

Avatar
from g1smd 2602 Days ago #
Votes: 0

Hmm, the 302 redirect will potentiallty index their content at your URL, using your accumulated PageRank to boost your URL.Duplicate Content nightmare in the making.

Avatar
from SamIWas 2602 Days ago #
Votes: 0

So, even though a page is robots.txt’d out, it could be accumulating PR if you are not nofollowing the actual link to that page? That’s pretty counter-intuitive and thus definitely an interesting read.

Avatar Moderator
from graywolf 2602 Days ago #
Votes: 1

Matt Cutts: Well, I have made a promise that my Webspam team wouldn’t go to the Google Analytics group and get their data and use it. Search quality or other parts of Google might use it, but certainly my group does not.Man more double talk. Or to represent it another way it, it doesn’t matter when you add salt to the food you are cooking, it’s always going to end up in the end dish.

Avatar
from JeremyLuebke 2602 Days ago #
Votes: 0

Robots.txt’ed out files can accure PR. I guess usings Robots.txt is now out for LinkJuice molding. Good to know.

Avatar
from g1smd 2602 Days ago #
Votes: 0

Pages disallowed using  robots.txt  can accumulate PR but cannot pass it on.The page will not be spidered, cached, or indexed.

Avatar
from lucia 2602 Days ago #
Votes: 0

As the non-seo always trying to catch up, I have a question. (I think for Andy, but possibly for Matt Cutts.)  As some know, I like to write plugins.  Currently, I’m fiddling with one to create sort of text looking ads, and I need to prevent it form causing problems. Currently, the plugin sort of text links. (You can view here: http://money.bigbucksblogger.com/this-is-a-test-it-is-only-a-test-if-it-were-not-a-test-you-could-download-the-plugin/  There are a number of things I want to change and any recommendations welcome)Other than Amazon, the affiliate links on the page go to addresses that end in /ads=?keyword.  When clicked, the user passes through a script that creates the real affiliate link ($link) for that keyword and redirects them. I do this:    header("X-Robots-Tag: nofollow");    @header(’Location:’.$link);And the person who clicks zings off to the affiliate.Will this create the black hole?  Do I need to do something else specifically? Thanks.

Avatar
from ericenge 2602 Days ago #
Votes: 0

g1smd - I think you have it right. Pages that are named in Robots.txt can accumulate PR, but since these pages are not crawled, they can’t pass it.

Avatar
from g1smd 2602 Days ago #
Votes: 0

Yeah. I wanted to repeat and clarify what was said because I already see it being misquoted in a bunch of other places, and turned into misinformation. 

Avatar
from Harith 2602 Days ago #
Votes: 0

g1smdJust to say; its always informative and educating to have you on a thread. Thanks ;-)

Avatar
from corey 2602 Days ago #
Votes: 2

"Matt Cutts: Right. You could take it even further and help people get the answer directly from a snippet on the search engine results page, and so they didn’t click on the link at all." And then delete your website because you just gave Google your content. All we’ll need is a non-expiring cache and we can all delete our websites!

Avatar
from billse 2602 Days ago #
Votes: 0

Matt’s answered are more specific here than many other interviews - that’s always a worth a read.

Avatar
from MattCutts 2602 Days ago #
Votes: 0

"Just to be sure; you’re saying that both noindexed pages and robots.txt’ed out pages are able to pass PR?"Wiep, both can accrue PageRank. NoIndex pages can pass PageRank, but robot.txt’ed out pages can’t. Why not? Because we didn’t crawl the robot.txt’ed page, we don’t know its outlinks, so we can’t flow PageRank out from robot.txt’ed pages.corey, I was just making a point about why things like clickthrough might seem like a good signal, but in fact would be noisy/biased.

Avatar
from Halfdeck 2601 Days ago #
Votes: 0

Ok, so let me get this right - no matter what’s in a robots.txt or META robots tag, if there are links pointing to a URL, the URL can be picked up by Google and it can accumulate PageRank.As for PageRank passage:robots.txt disallow: prevents PageRank passing because Google doesn’t retrieve the content of a disallowed URL, so links on a disallowed URL never gets picked up.META noindex: can pass PageRank because 1) Google needs to retrieve the content of a META noindex url to recognize the META tag 2) If Google finds links on those pages, they will be followed, and thus PageRank flows through those links.META noindex,nofollow: can’t pass PageRank, even though content is fetched, because Google doesn’t follow any of the links on a META nofollowed page.

Avatar
from g1smd 2601 Days ago #
Votes: 0

Yep. That’s it, except that the nofollow meta tag is for all links on the whole page, and the nofollow attribute on a link is just for the URL in that link.

Avatar
from Harith 2601 Days ago #
Votes: 0

g1smd,Back to duplicat contents. Lets assume that we have page-A containing 10 internal links, and another page which is 100% duplicate of it; page-B. Then we meta-noindex page-B to avoid duplicates. As such Google wouldn’t "penalize" for duplicates. So we end with two pages passing PageRank:- page-A pass PageRank to the 10 internal links.- page-B (meta-noindexed) pass PageRank to the same 10 internal links.Now how about the case of creating 8 duplicates. Meta-noindex 7 of them. Ending with 8 pages passing Page-Rank? Thats Spam-deluxe ;-)So in ethical SEO, if we have 100% duplicates we MUST robot.txt disallow one of them, right?

Avatar
from g1smd 2601 Days ago #
Votes: 0

Ah, but there’s another factor to take into account here.Just because a page exists, doesn’t mean to say that Google will keep a copy of it. So, the duplicate page may get dropped from their storage system and not get counted at all.(Remember, that URLs that return a page of content that you have tagged as meta robots noindex are still crawled and cached by Google. The  meta robots noindex  tag simply says to Google to not show the page in public results. However, they don’t have to keep a copy of it. It could be ruled out completely.)

Avatar
from g1smd 2601 Days ago #
Votes: 0

As for your question, I think it depends on what sort of "100% duplicate" you are talking about.If it is www and non-www stuff, or perhaps alternative domains or TLDs accessing the same content, then I still think that the site-wide 301 redirect is the best way to fix it. It ensures that the site is seen under only one URL format, rather than several formats but with some of them being "hidden".If it is some other sort of "duplicate" then a careful analysis may be required to decide the best course of action to follow. There are many possible ways to tackle the problem, but with various benefits and disadvantages to each.

Avatar
from Harith 2601 Days ago #
Votes: 0

Thanks.But I was talking in theory about whether PageRank spammers would/can explore what Matt said.

Avatar
from iBrian 2599 Days ago #
Votes: 1

That was a very refreshing - and mature - interview. :)All to often recently we’ve seen Google’s motives second-guessed, and people languish in debating Google policy. It was nice just to read Matt lay out the finer points on the use of tags, as well as Google’s overall approach to webspam.One question I would love to ask Matt is how on earth he manages to keep such a thick skin, when his name and image is used so much, and as a first point of communication with Google, his name can be directly associated and attacked by irate webmasters over any element of Google policy. That certainly takes a lot of strength of character.Maybe better asked over a cuppa some time, if he feels it’s safe to answer, rather than directly in the context of dicusssion on Google policy. :)

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: