Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

Exciting new features at SEOmoz, including an Index of the World Wide Web (30 Billion pages and growing!), the Linkscape Tool (extensive link data based on own index), a fresh design and a comic book.
Comments66 Comments  

Comments

Avatar
from Rob 2021 Days ago #
Votes: 5

They’ve got a link graph of the internet? I need to go and lie down in a darkened room for a few minutes.

Avatar
from MarkeD 2021 Days ago #
Votes: 1

I can see this being the next trend, private indexes for each SEO agency.  How will the search engines respond?

Avatar
from leadegroot 2021 Days ago #
Votes: 2

If we do start seeing that sort of bot action (’private indexes for each SEO agency’) it will be interesting to see if the response of the savvy webmaster is ’block’ or ’pass’.<div><div>After all, we pay for the bandwidth of the search engine bots crawling us because they return something - traffic (yes, I won’t argue that the engines get something out of it!)</div><div>But corporations crawling for their own uses, then offering to sell our data back to us.</div><div></div><div>Hmmm...</div></div>

Avatar
from MattC 2021 Days ago #
Votes: -1

This is a glorious day!

Avatar
from MarcoVittigni 2021 Days ago #
Votes: 2

this could be a game changer

Avatar
from HeadlandDigital 2021 Days ago #
Votes: 1

This data could prove incredibly valuables to webmasters and internet marketers. Hats off to SEOMoz!

Avatar
from JohnHGohde 2021 Days ago #
Votes: -16

The are a million different ways to design a search engine.The only index that counts is Google.  The only thing that counts is how Google is indexing the Web.  Now, if you were claiming to have exactly duplicated Google’s search engine then you might have something to crow about.  But, if some SEM wants pay these people $800 a year in order to play some type of irrelevant academic game then they simply don’t know what they are doing.  Or, they in turn are planning to scam their customers with these totally useless search engine data statistics.

Avatar
from MattC 2021 Days ago #
Votes: 0

Well it could be very interesting. Even if it is only a sample of what Google may have. It still may be the best data we have seen in a long time. It did seem like the link back count was much smaller.

Avatar
from RobBothan 2021 Days ago #
Votes: 1

A really interesting and big thinking move on SEOmoz’s part - I’m inclined to say this is a positive move as it’ll help marketers examine what they have. As to @Leadegroot - seomoz published their financial stat’s on their blog - usually public bodies have this level of transparency, so i dont see this changing their attitude, and your only paying if you want to use it! <div>And JohnHGohde - its not a search engine! if you look at it, its pretty clever way of doing link analysis, oh and i wouldnt say it was all about google if i lived in china, pretty wide statement you made!</div>

Avatar
from JoshuaSciarrino 2021 Days ago #
Votes: -1

It’s about time us SEO’s have a search engine FOR US by US. Seriously.<div></div><div>Google can be secretive while we innovate. :)</div>

Avatar
from g1smd 2021 Days ago #
Votes: 3

I’ll bet they now have some "interesting" insight into some very scummy tactics that other sites are using.<div></div><div></div><div>Who’ll be first to build a tool that hooks their API to batch load URLs to report to Google for spamming?</div><div></div><div></div><div></div>

Avatar
from johnandrews 2021 Days ago #
Votes: 21

Certainly interesting, but raises some sticky issues for a company claiming to be "transparent". What user agent do they crawl with? Are they stealth crawling web sites and then selling that data back to webmasters for $800/year? Certainly looks like that’s the plan. And even at that price level, the results are limited to a small number of links.. why? Google could eliminate much of the perceived value of this with a keystroke.... is it time for Google to give us more link data via webmaster console? Also hasn’t majestic SEO been doing this same project for a while? RewriteCond %{HTTP_USER_AGENT} MJ12botRewriteRule .* - [F]Ahh.. sad to see the negative votes already... I guess I should have posted "wow, great tool.. awesome work.. can’t wait to use it.. you guys rock!"

Avatar
from davidmihm 2021 Days ago #
Votes: 3

John, I agree that if Google were to share more data, the value of this would be diminished somewhat. But as far as competitive intelligence, i.e. for sites that a particular webmaster DOESN’T "own" in WMC, I think this will stand as the best and only player for a LONG time. Also, have you used the tool yet? I was pretty impressed with both the depth and breadth of links it was pulling up for the few reports that I ran.For the record, I don’t think there’s anything wrong with posting negative comments when they have some credibility behind them, as yours did. That’s what discussion is all about!

Avatar
from IncrediBILL 2021 Days ago #
Votes: 5

I saw something back in march from IP 209.40.116.200 calling itself "SEOmoz-bot" and that IP is on the same host as seomoz.org. If in fact that was it, it didn’t get a single page so nothing my sites link out to were counted in this map.John mentioned MajesticSEO and I was thinking that maybe they just partnered up with Majestic for this crawl data.However, if it’s stealth, it will be found.

Avatar Moderator
from hugoguzman 2021 Days ago #
Votes: 0

@johnandrews - well said.Google recently reacted to MSN’s attempts at robust search query analytics (via their excel plugin) so I wouldn’t be surprised if Google opened up their backlink data in response to both this tool and Yahoo’s efforts to refine it’s "Yahoo Site Explorer" tool.I actually think that this tool is a small fish in a large sea (compared to the heavyweight search engines) but it could be the straw that breaks Google’s back and forces them open up the flood gates.Or maybe I’m just offering up wishful thinking...

Avatar
from Dugdale 2021 Days ago #
Votes: 3

Cool tool, but I want to know the user agent so I can block it - no sense giving my competitors my data if the tool ends up working well.

Avatar
from Alysson 2021 Days ago #
Votes: 3

The bottom line is that any tool that provides useful and relevant information is worth taking a closer look at.  What SEOmoz has done is attempt to provide it’s users with information not previously available in such detail before.  I’m not an SEOmoz Pro member, so the information gleaned from the free reports is limited.  On the other hand, limited or not, I still find the information useful.I believe it’s far too early on to discount this tool as "pointless" (as some above appear to believe) or "the best thing since sliced bread".  Only time will tell how accurate and useful this information is in the grand scheme of things, but it’s also important to recognize the fact that each tool like this that is developed forces Google one step closer being "exposed"...perhaps giving them reason to be a bit less secretive than they have been in the past.  Good, bad or indifferent in terms of how the data is ultimately used, tools like Linkscape help webmasters better determine the impact of the links to their site and identify possible patterns.  That, in and of itself, is a big step in the right direction and can provide very useful information.

Avatar
from NickWilsdon 2021 Days ago #
Votes: 6

An interesting tool for SEOs but John, Bill and Dugdale have good points. If companies can buy this information then you should think seriously about blocking this bot from your client sites. Competitive webmastering people.I really like the Moz crew but lets be honest here, if this was any other company we’d see a lot more complaints about our privacy being abused. Case in point, check the comments on the tool MajesticSEO put on Sphinn a few weeks back. It would be good if Rand can let us know what user agent they are using so those of us who want to block can do so.

Avatar
from johnandrews 2021 Days ago #
Votes: 6

@seoaly there are HUNDREDS of companies whose business model is crawl the web and monetize the information by selling it to clients.  If you watch the logs of highly-specific niche web sites that are meaningful for their markets (but not wildly popular in the pop-culture sense), you will see many cases where 80% or more of the traffic (bandwidth - which is direct expense overhead for the web publisher) is consumed by those crawlers. The last 3 startups I visited had people practically dedicated to banning bots because the costs are so real and the benefit to the web publisher practically nil.As a cooperative web of sites we need common courtesies such as robots.txt compliance... regardless of the potential "value" of allowing service X or Y to crawl. To paint it any other way is naive. If Google didn’t deliver value to us webmasters we’d block it, too. As you note, the free reports are not likely to deliver much value.I don’t know what  the crawling policies of SEOMoz are.. which is why I asked. I’m curious as to why it hasn’t been put out there front and center.

Avatar
from DanThies 2021 Days ago #
Votes: 6

We looked into doing a similar (smaller) crawl a few years ago, and the informal consensus at SES was that if we exposed our user-agent webmasters with half a brain would block our bot. We didn’t pursue it. I don’t see any reason why that would change.As far as whether the data is useful, of course it is. The open question is how much could be acquired through the tools that we already use. It’ll take some time to run reports and get a sense of that, and for smaller sites the likely answer is "nothing terribly important." For sites with a large link profile (>1000 inbounds on key URLs) there’s a good chance that the SEOMoz crawl will be able to add something to the data set.

Avatar
from DanThies 2021 Days ago #
Votes: 6

What’s up with all the negative votes against people suggesting that SEOMoz comply with the robot exclusion standard?

Avatar
from evilgreenmonkey 2021 Days ago #
Votes: 3

Rand wouldn’t disclose the UA, so I’m guessing it’s a faked IE7 or Fx3 header. It does check your robots.txt file and obey it though. My suggestion would be to detect any requests for the robots.txt file which aren’t made by Google, MSN or Yahoo and then block those IPs from your site. A little overkill, although how many of your target visitors check your robots.txt before surfing your site? It doesn’t stop competitors from looking at your backlinks though, just stops them crawling your site. Another option is to serve a Dissallow: / robots.txt to unwanted UAs, although that’s probably a little more risky.

Avatar
from DanThies 2021 Days ago #
Votes: 3

How can it possibly obey robots.txt if the UA isn’t disclosed?

Avatar
from evilgreenmonkey 2021 Days ago #
Votes: 3

I’m guessing that it obeys * and common SE UA Disallow’s. Makes sense for them as it can be used to avoid dupes and infinite URLs. I doubt they’ll disclose any more information to clarify this.

Avatar
from IncrediBILL 2021 Days ago #
Votes: 2

@danthies - most webmasters don’t have half a brain nor know what’s crawling, nor have the tools to stop it if they did, so if you want to scour the net for data it will be mostly uninhibited.I’m still thinking they bought this data...

Avatar
from NickWilsdon 2021 Days ago #
Votes: 1

Other people are also asking for Rand to disclose the crawler UA here. I’ve added my comment there. As I wrote before, totally love the Moz crew but this information should be given out to webmasters.

Avatar
from NickWilsdon 2021 Days ago #
Votes: 2

@IncrediBillPossibly but Dan blocks all spiders and still has his data in there. Looks like it ignores robots meta tags with noindex also, since we’ve got sites that serve noindex by default unless it’s a validated SE spider that we’ve chosen to allow. Rand, you’ve got some explainin’ to do here.If it was from MajesticSEO they claim to ebey robots.txt. Could be from another source of course.

Avatar
from g1smd 2021 Days ago #
Votes: 1

I have found them in the logs.<div></div><div></div><div>In July and August, almost simultaneous requests from two different IP addresses, and then (in August only) one further request a few minutes later:<div></div><div></div><div></div><div><div>2008-July-xx      - Almost Simultaneous Requests:</div><div></div><div>209.40.100.248  - 209.40.100.248      - HopOne Internet Corporation</div><div></div><div>209.160.24.62    - seomoz.org            - HopOne Internet Corporation</div><div></div><div></div><div></div><div>2008-August-xx  - Almost Simultaneous Requests:</div><div></div><div>209.103.165.202 - client.covesoft.net - HopOne Internet Corporation</div><div></div><div>209.160.24.62     - seomoz.org          - HopOne Internet Corporation</div><div></div><div></div><div>2008-August-xx  - A few minutes after the previous request:</div><div></div><div>209.40.112.202  - 209.40.112.202      - HopOne Internet Corporation</div></div><div></div><div></div><div></div><div>These logs don’t record the UA that was presented.</div><div></div><div></div></div>

Avatar
from g1smd 2021 Days ago #
Votes: 0

No crawling found in June or July... and I didn’t look any earlier than that.I didn’t find anything from 209.xxx.xxx.xxx in September, and it is too early to be thinking about October.It is possible that not all of those are SEOmoz; some are unidentified.

Avatar
from IncrediBILL 2021 Days ago #
Votes: 0

@NickWilsdon, blocking all spiders won’t stop IBLs from being indexed, but will stop OBLs.Besides, blocking visible spiders doesn’t block stealth spiders which is an art form all of it’s own.I’m still betting this data was purchased from someone like MajesticSEO, which even obeying robots.txt would still have all your IBLs or all your data prior to the installation of a block in robots.txt.Hard to say, if they crawl, we will find them.Just ask Picscout...

Avatar
from Statsfreak96 2021 Days ago #
Votes: 1

Looks like SEOmox did a good job in design, I only like to see one full LinkScape preview report so I can compare their result set with the MajesticSEO back links tool.Other thing what I miss is their “robot” agent string, I checked all my logs of the many domains I own or my customers own but none referrer to their site or name or tool.After a more intensive search I found a regular visiting bot to all the domains average once a month. It never requested the robots.txt! and has no robot agent string! but use IE6 (from 10/8/2007) or IE7 (from 12/13/2007) as agent string.HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1)Hosts:ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.comIP ranges:72.44.32.0 - 72.44.63.255 AMAZON-EC2-267.202.0.0 - 67.202.63.255 AMAZON-EC2-375.101.128.0 - 75.101.255.255 AMAZON-EC2-4174.129.0.0 - 174.129.255.255 AMAZON-EC2-5For people who want to block this IP ranges there are also bots who request robots.txt and use a robot agent string and also use the Amazon EC2 service as:d1g, find mobi, netseer, accelobot, archive, alexa, enabal, etc.

Avatar
from g1smd 2021 Days ago #
Votes: 0

Hmm. On further thought, it may be that the IPs I found in the logs are from some other tool use at SEOmoz, rather than used for this new link data.I have no way of verifying anything much more, especially with the limited data that I can currently access. Treat my initial IP list, as speculation.

Avatar
from DanThies 2021 Days ago #
Votes: 1

Some notes on the value of the data.... So far, in comparing other data sources available to us, SEOMoz is reporting a little less than 1/5 of the total links available through Google WMT for validated sites and about 1/8 the number that Yahoo reports. Depending on how SEOMoz is setting priorities for their crawl, this would mean their result set is either of higher quality, or it could mean that it’s just smaller.The numbers vary but SEOMoz consistently shows far fewer links. In terms of reporting quality links I think Y! does it better. SEOMoz indexes a lot of duplicate (identical) pages that a real search engine would likely drop - most likely because they are ignoring the content.

Avatar
from IncrediBILL 2021 Days ago #
Votes: 2

Teaser news:I think I’ve identified the source of the crawls and the user agent.Film @ 11

Avatar
from JohnHGohde 2020 Days ago #
Votes: -6

@RobBothan"its not a search engine! if you look at it, its pretty clever way of doing link analysis, oh and i wouldnt say it was all about google if i lived in china, pretty wide statement you made!"Same difference!Unless SEOmoz’s Index and Google’s Index are looking at exactly the same data then any statistics that SEOmoz is offering for sale has absolutely no validity, whatsoever.  For example, if as I suspect that SEOmoz is including tons of personal websites hosted by AT&T, AOL and other Free hosting services while Google is not, then anything that SEOmoz is claiming to conclude has absolutely no practical significance.All this involves pretty basic thinking skills, RobBothan.  From what I have seen on Sphinn, most SEMs suffer from an extreme lack of technical knowledge about the most basic stuff.As far as I am concern, all bots are a waste of bandwidth.  They are only acceptable, if I can get something out of it which would namely mean a MAJOR search engine.  So, yes SEOmoz crawling my site scraping my content looking for their stupid nonsense is certainly a waste of my bandwidth and should be blocked.

Avatar
from Mert 2020 Days ago #
Votes: 3

I find it an absolute shame from the SEOMoz employees who many SEOs considered as friends to remain so silent for 24 hours straight both here and in SEOMoz even though they tend to be the most active participants of the net. When Rand first came out with the VC funding for Moz news, he said we will never give in to our VC partners and not be "corporate". Then can someone explain, why the most active bunch of employees of the SEOMoz company (including Rand) remain so very silent about this issue. Welcome to the corporate world, Rand. It is not too much fun.

Avatar
from NickWilsdon 2020 Days ago #
Votes: 2

@Bill Looking forward to your info. I have complete faith in the skills of our resident arachnologist ;) We’re getting the run around in the thread over at MOZ, which is getting rather fustrating. Just like search engines we’re crawling publicly available material. We think our tool is revolutionary because it collects this free information and offers a lot data about links for free (check out the Linkscape basic report). As with any crawler you should check out when bots visit your site. If you don’t like what they’re doing you should certainly check out the Robots Exclusion Policy. Robots.txt is a great way to limit what (good behaving) robots crawl. On a personal note, I think you should focus on generating great, link-worthy content and share that with as many people as you can. Drive search engine performance, and drive quality traffic. Maybe you can use some great tools to help :) That’s a pretty patronising answer. Especially when the UA/IP info would be kind of helpful when setting up our robots.txt rules. What’s the issue here Rand?

Avatar
from DanThies 2020 Days ago #
Votes: 0

Made a suggestion to Rand this morning - keep the UAs (plural as it turns out) secret, but give webmasters a UA they can use in robots.txt, which would be honored.

Avatar
from lindop 2020 Days ago #
Votes: 4

This is probably the first bad word I’ve had against SEOmoz, a shame really because I owe so much to them. This will spiral out of proportion if it is allowed to continue, resulting in much wider publication (and blocking) of the UA when it is discovered or released.

Avatar
from Mert 2020 Days ago #
Votes: 2

@Nick, I am personally insulted and outraged at the moment. SEOMoz’s core client base are professional SEO agencies and they take us to be fools. I never ever expected such a response (or lack of response period) from the Moz Bunch (the most outspoken group that go over and over about the Google hypocracy). I can see Matt Cutts laughing from his seat now. @DanThiesCan you give me one single reason why a company who keeps their "Plural" UAs secret honor a third party UA or the fact that we actually know 100% that they do honor it after they actually already said we already honor it as quoted above? For an agency, the golden rule is client privacy as much as exposure. Actually I would like Rand to share his email conversation with you here. This is beyond patronising at this moment, it is absolutely demeaning and insulting. I am done talking about it in public. Going to write my blog post now on the subject. @IncredibillYou are my hero.

Avatar
from DanThies 2020 Days ago #
Votes: 1

Mert,Rand just told me that they’ll work on implementing a UA that we can add to robots.txt when they get back from SMX. So we will be able to block this bot in robots.txt. It’s easy to verify whether or not they’re indexing a site by using the tool - that’s how I was able to verify that they’re ignoring NOINDEX.They’ve been honoring Disallow: /  but obviously that isn’t good enough. They’ve been indexing pages with NOINDEX in the robots meta tag, and obviously that’s not acceptable to webmasters either. Reporting links on NOINDEX pages is actually a *feature* of the tool, that I think they need to roll back.If Rand wants to share the emails that’s up to him, but I’ve already said what the substance of the messages was. I didn’t like what they were doing and offered some suggestions on how to fix it.Their entire team, or close to it, is traveling, so we may not hear a lot from them until they get back.

Avatar
from g1smd 2020 Days ago #
Votes: 2

Google follows links out from NOINDEX (meta tag) pages; they treat NOINDEX purely as an instruction to not show the content of that page in the public index.<div></div><div></div><div>They still keep a copy of that page in their internal databases, assign PR to the page, and follow links in and out.</div>

Avatar
from mbeijk 2020 Days ago #
Votes: -1

I believe the SEOmoz team is too busy fixing the first few bugs ;-)

Avatar
from Mert 2020 Days ago #
Votes: -1

@DanThies, You are the man as the voice of sensibility. @Rand thanks for the reasonable future solution. But honestly being quite helped noone. It is not like you are not liked by most people who critiqued it here.

Avatar
from g1smd 2020 Days ago #
Votes: 0

One reason that the Mozzers have been quiet online is that most of their staff are attending the SMX East Conference - on the opposite side of the USA (and four time zones different) to where their offices are.<div></div><div></div><div>I suspect they have zero time to be online and answering blog posts, forum posts, private messages, and emails - especially when they have at least several panels to speak on, and an expo booth to staff.</div><div></div><div></div><div>I am surprised that all this stuff wasn’t a bit better thought through though, before launch - but now they are aware of the concerns, give them a week or two to now get on with whatever needs to be done.</div>

Avatar
from IncrediBILL 2020 Days ago #
Votes: 2

@Dan - UA may be found!I posted my research:http://www.webmasterworld.com/search_engine_spiders/3759661.htmAlso, check out this connection:http://www.robtex.com/ip/208.115.111.242.htmlSee that Seomoz-Crawl1?Hmmm...

Avatar
from g1smd 2020 Days ago #
Votes: 3

<div></div><div>Bill, I don’t see it in that URL, but this shows me what I was looking for:</div><div><div></div><div>http://www.robtex.com/dns/x?q=seomoz</div><div></div><div></div><div>Hmm, new entries have appeared in the last few minutes. Looks like they add stuff that people searched for - so still not conciusive.</div></div>

Avatar
from mvandemar 2020 Days ago #
Votes: 1

Made a suggestion to Rand this morning - keep the UAs (plural as it turns out) secret, but give webmasters a UA they can use in robots.txt, which would be honored.There is no valid reason for them to not disclose this information, and faking a way for webmasters to tell them not to crawl given pages is not sufficient by any means. They need full disclosure on this one, period.

Avatar
from DanThies 2020 Days ago #
Votes: 3

I agree that’s what we all want, mvandemar. But they aren’t required to do that, and honoring a Disallow directive in robots.txt that’s unambiguously intended for their bot is sufficient to comply with the robot exclusion standard. As Bill Atchison has pointed out, we’ll figure them out anyway, so they may as well just disclose it. I think it’s silly to try to hide.

Avatar
from mvandemar 2020 Days ago #
Votes: 2

But they aren’t required to do that...Required by who? No, there is no governing body on Non-Shady Internet Crawling Tactics... but you saw the kind of uproar Microsoft caused when they started doing it. Do you really think that there is any valid reason whatsoever for them to cloak their bot?You’re right though, it is silly for them to try.

Avatar
from randfish 2020 Days ago #
Votes: 4

Hey - sorry I haven’t been in here. Totally hear you on the bot stuff. We are going to set up a way to block as soon as Nick and I are back in Seattle and will share how to do that. I would have been here sooner, just couldn’t find five seconds between expo booth, sessions, meetings, etc.Thanks to the many kind comments above, and please do accept my apologies for the delay. I understand it’s an important issue and promise we’ll take care of it responsibly.

Avatar
from Halfdeck 2020 Days ago #
Votes: 5

"They’ve been indexing pages with NOINDEX in the robots meta tag, and obviously that’s not acceptable to webmasters either."<div></div><div>Dan, just to clear this up for others, like g1smd said, noindex does not mean don’t index (at least for Google); it just means crawl+fish links but do not display the URL in search results. For SEOMoz to create an accurate copy of Google’s link graph, noindexed pages should be crawled.</div><div></div>

Avatar
from eKstreme 2020 Days ago #
Votes: 4

The only acceptable method is to fully disclose the UA. Not a string to be used in robots.txt but the full user agent.<div></div><div>Anything else is just scraping on the same level as spammers. Think about that.</div>

Avatar
from Skitzzo 2019 Days ago #
Votes: 1

Rand, I really think you guys need to have someone on your team dedicated to putting out fires. Call em a corportate fire fighter or something but it seems like every time something hits the fan in regards to SEOmoz, it takes quite a while to hear from you guys and you’re always extremely busy.I can’t imagine how hectic your schedule is, so I think it would be worthwhile for you to have someone that doesn’t keep quite that kind of schedule available to give feedback on stuff like this. I mean, you knew you guys were going to be busy and you had to have expected a ton of feedback. How is it that you guys didn’t plan on being able to engage that feedback or address any issues that sprang up?

Avatar
from DanThies 2019 Days ago #
Votes: 0

@Halfdeck - the standard is vague, but roughly so: "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. SEOMoz is indexing text, which does appear in the tool’s search results, including the title & anchor text, not just following links. At the very least they are pushing the envelope.

Avatar
from johnandrews 2019 Days ago #
Votes: 4

This may be the token event that brought blocking into the mainstream for webmasters. There is so little benefit to allowing everyone and his cousin to spider your site, and much to be lost. My guess is Randy used a lot of publicly available data to build up his billions of pages (such as the dotnetdotcom.org dataset) and so the UAs must be kept a trade secret - if they knew the UAs, just about anyone could offer a similar service with low barrier to entry (tool development). If this is the monetization model for the new SEOMoz (charge $799/year for a tool that massages publicly available data) it’s just for SEO firms and Agencies and I can continue to ignore it, but I will certainly block it from costing me money (bandwidth and server capacity, to start).I’ll block all spiders except those that trade value with me (Google, Yahoo, MSN, and a few others including industry-specific aggregators), and I encourage others to do the same. For the spider producers it’s real simple - offer value to webmasters and let them choose to trade with you.

Avatar
from IncrediBILL 2019 Days ago #
Votes: 1

@johnandrews - if a crawler did something designed to bypass reasonable site security, which mine is unreasonably tight, that it’s possible it’s not legal, at least in California where my site resides.http://incredibill.blogspot.com/2006/11/legality-of-stealth-robots_116474846285292987.htmlIf it were me, I would cough up a user agent and IP addresses real fast just to avoid potential issues such as these.

Avatar
from Halfdeck 2018 Days ago #
Votes: 2

@dan " the standard is vague, but roughly so: "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed"<div><div></div><div>You’re right; the page text doesn’t get indexed, though NOINDEXed HTML is scraped and processed (otherwise Google cannot read META NOINDEX data).</div><div></div><div></div><div>I agree SEOmoz tool should not be displaying noindexed pages in search results.</div></div>

Avatar
from DanThies 2018 Days ago #
Votes: -1

@Bill - if it were me, I’d cough it up just so I didn’t have to deal with reverse-denial-of-service defenses from people who know how to respond in kind. :D@John - I don’t know if *this* is the trigger, but sooner or later, it will happen. Bill probably has better data but bad bots and the related issue of DOS attacks merit more attention.@Halfdeck - Exactly... they have to grab and process the page to find out that it has noindex, but once they see that, SEOMoz is at least a little over the line with what they do.

Avatar
from Curtis 2018 Days ago #
Votes: 0

First of all, in the interest of full disclose (something I would like to see more of here just so I can figure out which agenda is being served), I will start with saying SEOmoz is a) a friend, b) a client, and c) a partner of mine. I was with them at SMX East in NY and it was interesting to watch the response to Linkscape closely from the outside.  I also want to second SEOaly’s comment that “any tool that provides useful and relevant information is worth taking a closer look at.” As someone who pays a lot of money to both inside and outside SEOs (and has been an SEO as long as anyone), any tool that helps same some time or make them more efficient or effective has my attention. As to the $79/m – I am not sure what people here charge for their services but this is the equivalent of less than an hour’s work, and I am sure at the very least there is more than an hours worth of value here, especially if you have more than 1 client. I do get why people are concerned however I know Rand and Gillian and I can say if anyone was ever going to listen, it will be them. I know because I cannot say that I have ever given them any advice, they have not taken it seriously, as I know they are taking every concern here seriously.  Finally for anyone who has not tried it, I highly recommend doing so; then make your own decision. Do not get caught up with personal agendas or whatever else might fuel the negativity. Make up your own mind. Personally I think they should be commended as they took on a very large undertaking and with its release on Tues, they have definitely shaken things up. Great job guys.

Avatar Administrator
from dannysullivan 2018 Days ago #
Votes: 4

I was surprised that blocking information wasn’t up at the time the tool was released. Even stealthy Cuil/Cuill put up blocking information *before* they released their product.Bottom line, if you’re going to crawl the web and to be a good web citizen, then you obey robots.txt blocking. If you don’t, then you’re not a good citizen in my books.Rand says blocking will be added, so great if that happens in short order. I know what it’s like coming off of a trip. What’s not clear is how quickly the existing data will be dropped. If they don’t recrawl that often, then site owners then site owners who don’t want to be in the database have to wait for a revisit, then wait for removal.Possibly they could add an instant remove tool, but that’s a lot of work (you have to verify who owns a site, etc.).

Avatar
from johnandrews 2011 Days ago #
Votes: 0

First of all, in the interest of full disclose (something I would like to see more of here just so I can figure out which agenda is being served), I will start with saying SEOmoz is a) a friend, b) a client, and c) a partner of mine....[snip] ...  As to the $79/m – I am not sure what people here charge for their services but this is the equivalent of less than an hour’s work...[snip]...Finally for anyone who has not tried it, I highly recommend doing so; then make your own decision. Do not get caught up with personal agendas or whatever else might fuel the negativity. Make up your own mind. Personally I think they should be commended..Come on Curtis, that "personal agenda" thing is a stretch... you sound like a true fanboy: if anyone is negative it must be driven by "personal agendas". If you read anything on this issue, you would see there has been one very clear concern repeated over and over -- even by people who claim to love SEOMoz: this Linkscrape tool violates net etiquette and it does so for a profit at webmaster cost. No need to slant the issue here.. we’re all marketers. Reputatin management 101 -- address the issue directly and honestly. Fix the problem. Everything else just makes it worse. 

Avatar
from randfish 2011 Days ago #
Votes: 0

Sorry for the long delay on the update. Lots of internal talk and work had to be done to get this into place, but we are now disclosing our sources for data (and if you want, you can directly block those bots). If we start using new sources in the future, we’ll try to list all the major ones, so folks can choose to block those as well.We also will start obeying a meta tag to remove your page from our index, so even if there are bots whose data we use that you want crawling your site, we’ll still pull out the specific pages you mark with seomoz and noindex. Details are here - http://www.seomoz.org/linkscape/help/sources

Avatar
from roymitchell75 2010 Days ago #
Votes: 5

Randfish, it looks like you’re not actually crawling any sites on your own so what took so long in disclosing this?Also, can you explain why you led everyone to believe you had built your own crawler when in fact you’re just compiling the data of others?

Avatar Moderator
from Jill 2009 Days ago #
Votes: 0

As an FYI, Rand’s link above doesn’t work because it has a period at the end. Just remove that to see the source page he’s pointing to.[Mod: Thanks Jill, corrected the link above]

Avatar
from RyanU 1689 Days ago #
Votes: 0

Hate to resurrect an old thread but its been nearly a year and nary a word from Rand.  Am I missing something?  Or was that just a line of BS to placate us?



Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: