- 105
- Sphinn It!
Posted By: barnaseo 283 Days ago
Source: http://www.seomoz.org
Category: SEO
65 Comments
65 Comments












Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:


Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Comments
Theyve got a link graph of the internet? I need to go and lie down in a darkened room for a few minutes.
I can see this being the next trend, private indexes for each SEO agency. How will the search engines respond?
If we do start seeing that sort of bot action (private indexes for each SEO agency) it will be interesting to see if the response of the savvy webmaster is block or pass.<div><div>After all, we pay for the bandwidth of the search engine bots crawling us because they return something - traffic (yes, I wont argue that the engines get something out of it!)</div><div>But corporations crawling for their own uses, then offering to sell our data back to us.</div><div></div><div>Hmmm...</div></div>
This is a glorious day!
this could be a game changer
This data could prove incredibly valuables to webmasters and internet marketers. Hats off to SEOMoz!
The are a million different ways to design a search engine.The only index that counts is Google. The only thing that counts is how Google is indexing the Web. Now, if you were claiming to have exactly duplicated Googles search engine then you might have something to crow about. But, if some SEM wants pay these people $800 a year in order to play some type of irrelevant academic game then they simply dont know what they are doing. Or, they in turn are planning to scam their customers with these totally useless search engine data statistics.
Well it could be very interesting. Even if it is only a sample of what Google may have. It still may be the best data we have seen in a long time. It did seem like the link back count was much smaller.
A really interesting and big thinking move on SEOmozs part - Im inclined to say this is a positive move as itll help marketers examine what they have. As to @Leadegroot - seomoz published their financial stats on their blog - usually public bodies have this level of transparency, so i dont see this changing their attitude, and your only paying if you want to use it! <div>And JohnHGohde - its not a search engine! if you look at it, its pretty clever way of doing link analysis, oh and i wouldnt say it was all about google if i lived in china, pretty wide statement you made!</div>
Its about time us SEOs have a search engine FOR US by US. Seriously.<div></div><div>Google can be secretive while we innovate. :)</div>
Ill bet they now have some "interesting" insight into some very scummy tactics that other sites are using.<div></div><div></div><div>Wholl be first to build a tool that hooks their API to batch load URLs to report to Google for spamming?</div><div></div><div></div><div></div>
Certainly interesting, but raises some sticky issues for a company claiming to be "transparent". What user agent do they crawl with? Are they stealth crawling web sites and then selling that data back to webmasters for $800/year? Certainly looks like thats the plan. And even at that price level, the results are limited to a small number of links.. why? Google could eliminate much of the perceived value of this with a keystroke.... is it time for Google to give us more link data via webmaster console? Also hasnt majestic SEO been doing this same project for a while? RewriteCond %{HTTP_USER_AGENT} MJ12botRewriteRule .* - [F]Ahh.. sad to see the negative votes already... I guess I should have posted "wow, great tool.. awesome work.. cant wait to use it.. you guys rock!"
John, I agree that if Google were to share more data, the value of this would be diminished somewhat. But as far as competitive intelligence, i.e. for sites that a particular webmaster DOESNT "own" in WMC, I think this will stand as the best and only player for a LONG time. Also, have you used the tool yet? I was pretty impressed with both the depth and breadth of links it was pulling up for the few reports that I ran.For the record, I dont think theres anything wrong with posting negative comments when they have some credibility behind them, as yours did. Thats what discussion is all about!
I saw something back in march from IP 209.40.116.200 calling itself "SEOmoz-bot" and that IP is on the same host as seomoz.org. If in fact that was it, it didnt get a single page so nothing my sites link out to were counted in this map.John mentioned MajesticSEO and I was thinking that maybe they just partnered up with Majestic for this crawl data.However, if its stealth, it will be found.
@johnandrews - well said.Google recently reacted to MSNs attempts at robust search query analytics (via their excel plugin) so I wouldnt be surprised if Google opened up their backlink data in response to both this tool and Yahoos efforts to refine its "Yahoo Site Explorer" tool.I actually think that this tool is a small fish in a large sea (compared to the heavyweight search engines) but it could be the straw that breaks Googles back and forces them open up the flood gates.Or maybe Im just offering up wishful thinking...
Cool tool, but I want to know the user agent so I can block it - no sense giving my competitors my data if the tool ends up working well.
The bottom line is that any tool that provides useful and relevant information is worth taking a closer look at. What SEOmoz has done is attempt to provide its users with information not previously available in such detail before. Im not an SEOmoz Pro member, so the information gleaned from the free reports is limited. On the other hand, limited or not, I still find the information useful.I believe its far too early on to discount this tool as "pointless" (as some above appear to believe) or "the best thing since sliced bread". Only time will tell how accurate and useful this information is in the grand scheme of things, but its also important to recognize the fact that each tool like this that is developed forces Google one step closer being "exposed"...perhaps giving them reason to be a bit less secretive than they have been in the past. Good, bad or indifferent in terms of how the data is ultimately used, tools like Linkscape help webmasters better determine the impact of the links to their site and identify possible patterns. That, in and of itself, is a big step in the right direction and can provide very useful information.
An interesting tool for SEOs but John, Bill and Dugdale have good points. If companies can buy this information then you should think seriously about blocking this bot from your client sites. Competitive webmastering people.I really like the Moz crew but lets be honest here, if this was any other company wed see a lot more complaints about our privacy being abused. Case in point, check the comments on the tool MajesticSEO put on Sphinn a few weeks back. It would be good if Rand can let us know what user agent they are using so those of us who want to block can do so.
@seoaly there are HUNDREDS of companies whose business model is crawl the web and monetize the information by selling it to clients. If you watch the logs of highly-specific niche web sites that are meaningful for their markets (but not wildly popular in the pop-culture sense), you will see many cases where 80% or more of the traffic (bandwidth - which is direct expense overhead for the web publisher) is consumed by those crawlers. The last 3 startups I visited had people practically dedicated to banning bots because the costs are so real and the benefit to the web publisher practically nil.As a cooperative web of sites we need common courtesies such as robots.txt compliance... regardless of the potential "value" of allowing service X or Y to crawl. To paint it any other way is naive. If Google didnt deliver value to us webmasters wed block it, too. As you note, the free reports are not likely to deliver much value.I dont know what the crawling policies of SEOMoz are.. which is why I asked. Im curious as to why it hasnt been put out there front and center.
We looked into doing a similar (smaller) crawl a few years ago, and the informal consensus at SES was that if we exposed our user-agent webmasters with half a brain would block our bot. We didnt pursue it. I dont see any reason why that would change.As far as whether the data is useful, of course it is. The open question is how much could be acquired through the tools that we already use. Itll take some time to run reports and get a sense of that, and for smaller sites the likely answer is "nothing terribly important." For sites with a large link profile (>1000 inbounds on key URLs) theres a good chance that the SEOMoz crawl will be able to add something to the data set.
Whats up with all the negative votes against people suggesting that SEOMoz comply with the robot exclusion standard?
Rand wouldnt disclose the UA, so Im guessing its a faked IE7 or Fx3 header. It does check your robots.txt file and obey it though. My suggestion would be to detect any requests for the robots.txt file which arent made by Google, MSN or Yahoo and then block those IPs from your site. A little overkill, although how many of your target visitors check your robots.txt before surfing your site? It doesnt stop competitors from looking at your backlinks though, just stops them crawling your site. Another option is to serve a Dissallow: / robots.txt to unwanted UAs, although thats probably a little more risky.
How can it possibly obey robots.txt if the UA isnt disclosed?
Im guessing that it obeys * and common SE UA Disallows. Makes sense for them as it can be used to avoid dupes and infinite URLs. I doubt theyll disclose any more information to clarify this.
@danthies - most webmasters dont have half a brain nor know whats crawling, nor have the tools to stop it if they did, so if you want to scour the net for data it will be mostly uninhibited.Im still thinking they bought this data...
Other people are also asking for Rand to disclose the crawler UA here. Ive added my comment there. As I wrote before, totally love the Moz crew but this information should be given out to webmasters.
@IncrediBillPossibly but Dan blocks all spiders and still has his data in there. Looks like it ignores robots meta tags with noindex also, since weve got sites that serve noindex by default unless its a validated SE spider that weve chosen to allow. Rand, youve got some explainin to do here.If it was from MajesticSEO they claim to ebey robots.txt. Could be from another source of course.
I have found them in the logs.<div></div><div></div><div>In July and August, almost simultaneous requests from two different IP addresses, and then (in August only) one further request a few minutes later:<div></div><div></div><div></div><div><div>2008-July-xx - Almost Simultaneous Requests:</div><div></div><div>209.40.100.248 - 209.40.100.248 - HopOne Internet Corporation</div><div></div><div>209.160.24.62 - seomoz.org - HopOne Internet Corporation</div><div></div><div></div><div></div><div>2008-August-xx - Almost Simultaneous Requests:</div><div></div><div>209.103.165.202 - client.covesoft.net - HopOne Internet Corporation</div><div></div><div>209.160.24.62 - seomoz.org - HopOne Internet Corporation</div><div></div><div></div><div>2008-August-xx - A few minutes after the previous request:</div><div></div><div>209.40.112.202 - 209.40.112.202 - HopOne Internet Corporation</div></div><div></div><div></div><div></div><div>These logs dont record the UA that was presented.</div><div></div><div></div></div>
No crawling found in June or July... and I didnt look any earlier than that.I didnt find anything from 209.xxx.xxx.xxx in September, and it is too early to be thinking about October.It is possible that not all of those are SEOmoz; some are unidentified.
@NickWilsdon, blocking all spiders wont stop IBLs from being indexed, but will stop OBLs.Besides, blocking visible spiders doesnt block stealth spiders which is an art form all of its own.Im still betting this data was purchased from someone like MajesticSEO, which even obeying robots.txt would still have all your IBLs or all your data prior to the installation of a block in robots.txt.Hard to say, if they crawl, we will find them.Just ask Picscout...
Looks like SEOmox did a good job in design, I only like to see one full LinkScape preview report so I can compare their result set with the MajesticSEO back links tool.Other thing what I miss is their “robot” agent string, I checked all my logs of the many domains I own or my customers own but none referrer to their site or name or tool.After a more intensive search I found a regular visiting bot to all the domains average once a month. It never requested the robots.txt! and has no robot agent string! but use IE6 (from 10/8/2007) or IE7 (from 12/13/2007) as agent string.HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)HTTP/1.0 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1)Hosts:ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.comIP ranges:72.44.32.0 - 72.44.63.255 AMAZON-EC2-267.202.0.0 - 67.202.63.255 AMAZON-EC2-375.101.128.0 - 75.101.255.255 AMAZON-EC2-4174.129.0.0 - 174.129.255.255 AMAZON-EC2-5For people who want to block this IP ranges there are also bots who request robots.txt and use a robot agent string and also use the Amazon EC2 service as:d1g, find mobi, netseer, accelobot, archive, alexa, enabal, etc.
Hmm. On further thought, it may be that the IPs I found in the logs are from some other tool use at SEOmoz, rather than used for this new link data.I have no way of verifying anything much more, especially with the limited data that I can currently access. Treat my initial IP list, as speculation.
Some notes on the value of the data.... So far, in comparing other data sources available to us, SEOMoz is reporting a little less than 1/5 of the total links available through Google WMT for validated sites and about 1/8 the number that Yahoo reports. Depending on how SEOMoz is setting priorities for their crawl, this would mean their result set is either of higher quality, or it could mean that its just smaller.The numbers vary but SEOMoz consistently shows far fewer links. In terms of reporting quality links I think Y! does it better. SEOMoz indexes a lot of duplicate (identical) pages that a real search engine would likely drop - most likely because they are ignoring the content.
Teaser news:I think Ive identified the source of the crawls and the user agent.Film @ 11
@RobBothan"its not a search engine! if you look at it, its pretty clever way of doing link analysis, oh and i wouldnt say it was all about google if i lived in china, pretty wide statement you made!"Same difference!Unless SEOmozs Index and Googles Index are looking at exactly the same data then any statistics that SEOmoz is offering for sale has absolutely no validity, whatsoever. For example, if as I suspect that SEOmoz is including tons of personal websites hosted by AT&T, AOL and other Free hosting services while Google is not, then anything that SEOmoz is claiming to conclude has absolutely no practical significance.All this involves pretty basic thinking skills, RobBothan. From what I have seen on Sphinn, most SEMs suffer from an extreme lack of technical knowledge about the most basic stuff.As far as I am concern, all bots are a waste of bandwidth. They are only acceptable, if I can get something out of it which would namely mean a MAJOR search engine. So, yes SEOmoz crawling my site scraping my content looking for their stupid nonsense is certainly a waste of my bandwidth and should be blocked.
I find it an absolute shame from the SEOMoz employees who many SEOs considered as friends to remain so silent for 24 hours straight both here and in SEOMoz even though they tend to be the most active participants of the net. When Rand first came out with the VC funding for Moz news, he said we will never give in to our VC partners and not be "corporate". Then can someone explain, why the most active bunch of employees of the SEOMoz company (including Rand) remain so very silent about this issue. Welcome to the corporate world, Rand. It is not too much fun.
@Bill Looking forward to your info. I have complete faith in the skills of our resident arachnologist ;) Were getting the run around in the thread over at MOZ, which is getting rather fustrating. Just like search engines were crawling publicly available material. We think our tool is revolutionary because it collects this free information and offers a lot data about links for free (check out the Linkscape basic report). As with any crawler you should check out when bots visit your site. If you dont like what theyre doing you should certainly check out the Robots Exclusion Policy. Robots.txt is a great way to limit what (good behaving) robots crawl. On a personal note, I think you should focus on generating great, link-worthy content and share that with as many people as you can. Drive search engine performance, and drive quality traffic. Maybe you can use some great tools to help :) Thats a pretty patronising answer. Especially when the UA/IP info would be kind of helpful when setting up our robots.txt rules. Whats the issue here Rand?
Made a suggestion to Rand this morning - keep the UAs (plural as it turns out) secret, but give webmasters a UA they can use in robots.txt, which would be honored.
This is probably the first bad word Ive had against SEOmoz, a shame really because I owe so much to them. This will spiral out of proportion if it is allowed to continue, resulting in much wider publication (and blocking) of the UA when it is discovered or released.
@Nick, I am personally insulted and outraged at the moment. SEOMozs core client base are professional SEO agencies and they take us to be fools. I never ever expected such a response (or lack of response period) from the Moz Bunch (the most outspoken group that go over and over about the Google hypocracy). I can see Matt Cutts laughing from his seat now. @DanThiesCan you give me one single reason why a company who keeps their "Plural" UAs secret honor a third party UA or the fact that we actually know 100% that they do honor it after they actually already said we already honor it as quoted above? For an agency, the golden rule is client privacy as much as exposure. Actually I would like Rand to share his email conversation with you here. This is beyond patronising at this moment, it is absolutely demeaning and insulting. I am done talking about it in public. Going to write my blog post now on the subject. @IncredibillYou are my hero.
Mert,Rand just told me that theyll work on implementing a UA that we can add to robots.txt when they get back from SMX. So we will be able to block this bot in robots.txt. Its easy to verify whether or not theyre indexing a site by using the tool - thats how I was able to verify that theyre ignoring NOINDEX.Theyve been honoring Disallow: / but obviously that isnt good enough. Theyve been indexing pages with NOINDEX in the robots meta tag, and obviously thats not acceptable to webmasters either. Reporting links on NOINDEX pages is actually a *feature* of the tool, that I think they need to roll back.If Rand wants to share the emails thats up to him, but Ive already said what the substance of the messages was. I didnt like what they were doing and offered some suggestions on how to fix it.Their entire team, or close to it, is traveling, so we may not hear a lot from them until they get back.
Google follows links out from NOINDEX (meta tag) pages; they treat NOINDEX purely as an instruction to not show the content of that page in the public index.<div></div><div></div><div>They still keep a copy of that page in their internal databases, assign PR to the page, and follow links in and out.</div>
I believe the SEOmoz team is too busy fixing the first few bugs ;-)
@DanThies, You are the man as the voice of sensibility. @Rand thanks for the reasonable future solution. But honestly being quite helped noone. It is not like you are not liked by most people who critiqued it here.
One reason that the Mozzers have been quiet online is that most of their staff are attending the SMX East Conference - on the opposite side of the USA (and four time zones different) to where their offices are.<div></div><div></div><div>I suspect they have zero time to be online and answering blog posts, forum posts, private messages, and emails - especially when they have at least several panels to speak on, and an expo booth to staff.</div><div></div><div></div><div>I am surprised that all this stuff wasnt a bit better thought through though, before launch - but now they are aware of the concerns, give them a week or two to now get on with whatever needs to be done.</div>
@Dan - UA may be found!I posted my research:http://www.webmasterworld.com/search_engine_spiders/3759661.htmAlso, check out this connection:http://www.robtex.com/ip/208.115.111.242.htmlSee that Seomoz-Crawl1?Hmmm...
<div></div><div>Bill, I dont see it in that URL, but this shows me what I was looking for:</div><div><div></div><div>http://www.robtex.com/dns/x?q=seomoz</div><div></div><div></div><div>Hmm, new entries have appeared in the last few minutes. Looks like they add stuff that people searched for - so still not conciusive.</div></div>
Made a suggestion to Rand this morning - keep the UAs (plural as it turns out) secret, but give webmasters a UA they can use in robots.txt, which would be honored.There is no valid reason for them to not disclose this information, and faking a way for webmasters to tell them not to crawl given pages is not sufficient by any means. They need full disclosure on this one, period.
I agree thats what we all want, mvandemar. But they arent required to do that, and honoring a Disallow directive in robots.txt thats unambiguously intended for their bot is sufficient to comply with the robot exclusion standard. As Bill Atchison has pointed out, well figure them out anyway, so they may as well just disclose it. I think its silly to try to hide.
But they arent required to do that...Required by who? No, there is no governing body on Non-Shady Internet Crawling Tactics... but you saw the kind of uproar Microsoft caused when they started doing it. Do you really think that there is any valid reason whatsoever for them to cloak their bot?Youre right though, it is silly for them to try.
Hey - sorry I havent been in here. Totally hear you on the bot stuff. We are going to set up a way to block as soon as Nick and I are back in Seattle and will share how to do that. I would have been here sooner, just couldnt find five seconds between expo booth, sessions, meetings, etc.Thanks to the many kind comments above, and please do accept my apologies for the delay. I understand its an important issue and promise well take care of it responsibly.
"Theyve been indexing pages with NOINDEX in the robots meta tag, and obviously thats not acceptable to webmasters either."<div></div><div>Dan, just to clear this up for others, like g1smd said, noindex does not mean dont index (at least for Google); it just means crawl+fish links but do not display the URL in search results. For SEOMoz to create an accurate copy of Googles link graph, noindexed pages should be crawled.</div><div></div>
The only acceptable method is to fully disclose the UA. Not a string to be used in robots.txt but the full user agent.<div></div><div>Anything else is just scraping on the same level as spammers. Think about that.</div>
Rand, I really think you guys need to have someone on your team dedicated to putting out fires. Call em a corportate fire fighter or something but it seems like every time something hits the fan in regards to SEOmoz, it takes quite a while to hear from you guys and youre always extremely busy.I cant imagine how hectic your schedule is, so I think it would be worthwhile for you to have someone that doesnt keep quite that kind of schedule available to give feedback on stuff like this. I mean, you knew you guys were going to be busy and you had to have expected a ton of feedback. How is it that you guys didnt plan on being able to engage that feedback or address any issues that sprang up?
@Halfdeck - the standard is vague, but roughly so: "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed. SEOMoz is indexing text, which does appear in the tools search results, including the title & anchor text, not just following links. At the very least they are pushing the envelope.
This may be the token event that brought blocking into the mainstream for webmasters. There is so little benefit to allowing everyone and his cousin to spider your site, and much to be lost. My guess is Randy used a lot of publicly available data to build up his billions of pages (such as the dotnetdotcom.org dataset) and so the UAs must be kept a trade secret - if they knew the UAs, just about anyone could offer a similar service with low barrier to entry (tool development). If this is the monetization model for the new SEOMoz (charge $799/year for a tool that massages publicly available data) its just for SEO firms and Agencies and I can continue to ignore it, but I will certainly block it from costing me money (bandwidth and server capacity, to start).Ill block all spiders except those that trade value with me (Google, Yahoo, MSN, and a few others including industry-specific aggregators), and I encourage others to do the same. For the spider producers its real simple - offer value to webmasters and let them choose to trade with you.
@johnandrews - if a crawler did something designed to bypass reasonable site security, which mine is unreasonably tight, that its possible its not legal, at least in California where my site resides.http://incredibill.blogspot.com/2006/11/legality-of-stealth-robots_116474846285292987.htmlIf it were me, I would cough up a user agent and IP addresses real fast just to avoid potential issues such as these.
@dan " the standard is vague, but roughly so: "NOINDEX" allows the subsidiary links to be explored, even though the page is not indexed"<div><div></div><div>Youre right; the page text doesnt get indexed, though NOINDEXed HTML is scraped and processed (otherwise Google cannot read META NOINDEX data).</div><div></div><div></div><div>I agree SEOmoz tool should not be displaying noindexed pages in search results.</div></div>
@Bill - if it were me, Id cough it up just so I didnt have to deal with reverse-denial-of-service defenses from people who know how to respond in kind. :D@John - I dont know if *this* is the trigger, but sooner or later, it will happen. Bill probably has better data but bad bots and the related issue of DOS attacks merit more attention.@Halfdeck - Exactly... they have to grab and process the page to find out that it has noindex, but once they see that, SEOMoz is at least a little over the line with what they do.
First of all, in the interest of full disclose (something I would like to see more of here just so I can figure out which agenda is being served), I will start with saying SEOmoz is a) a friend, b) a client, and c) a partner of mine. I was with them at SMX East in NY and it was interesting to watch the response to Linkscape closely from the outside. I also want to second SEOaly’s comment that “any tool that provides useful and relevant information is worth taking a closer look at.” As someone who pays a lot of money to both inside and outside SEOs (and has been an SEO as long as anyone), any tool that helps same some time or make them more efficient or effective has my attention. As to the $79/m – I am not sure what people here charge for their services but this is the equivalent of less than an hour’s work, and I am sure at the very least there is more than an hours worth of value here, especially if you have more than 1 client. I do get why people are concerned however I know Rand and Gillian and I can say if anyone was ever going to listen, it will be them. I know because I cannot say that I have ever given them any advice, they have not taken it seriously, as I know they are taking every concern here seriously. Finally for anyone who has not tried it, I highly recommend doing so; then make your own decision. Do not get caught up with personal agendas or whatever else might fuel the negativity. Make up your own mind. Personally I think they should be commended as they took on a very large undertaking and with its release on Tues, they have definitely shaken things up. Great job guys.
I was surprised that blocking information wasnt up at the time the tool was released. Even stealthy Cuil/Cuill put up blocking information *before* they released their product.Bottom line, if youre going to crawl the web and to be a good web citizen, then you obey robots.txt blocking. If you dont, then youre not a good citizen in my books.Rand says blocking will be added, so great if that happens in short order. I know what its like coming off of a trip. Whats not clear is how quickly the existing data will be dropped. If they dont recrawl that often, then site owners then site owners who dont want to be in the database have to wait for a revisit, then wait for removal.Possibly they could add an instant remove tool, but thats a lot of work (you have to verify who owns a site, etc.).
First of all, in the interest of full disclose (something I would like to see more of here just so I can figure out which agenda is being served), I will start with saying SEOmoz is a) a friend, b) a client, and c) a partner of mine....[snip] ... As to the $79/m – I am not sure what people here charge for their services but this is the equivalent of less than an hour’s work...[snip]...Finally for anyone who has not tried it, I highly recommend doing so; then make your own decision. Do not get caught up with personal agendas or whatever else might fuel the negativity. Make up your own mind. Personally I think they should be commended..Come on Curtis, that "personal agenda" thing is a stretch... you sound like a true fanboy: if anyone is negative it must be driven by "personal agendas". If you read anything on this issue, you would see there has been one very clear concern repeated over and over -- even by people who claim to love SEOMoz: this Linkscrape tool violates net etiquette and it does so for a profit at webmaster cost. No need to slant the issue here.. were all marketers. Reputatin management 101 -- address the issue directly and honestly. Fix the problem. Everything else just makes it worse.
Sorry for the long delay on the update. Lots of internal talk and work had to be done to get this into place, but we are now disclosing our sources for data (and if you want, you can directly block those bots). If we start using new sources in the future, well try to list all the major ones, so folks can choose to block those as well.We also will start obeying a meta tag to remove your page from our index, so even if there are bots whose data we use that you want crawling your site, well still pull out the specific pages you mark with seomoz and noindex. Details are here - http://www.seomoz.org/linkscape/help/sources
Randfish, it looks like youre not actually crawling any sites on your own so what took so long in disclosing this?Also, can you explain why you led everyone to believe you had built your own crawler when in fact youre just compiling the data of others?
As an FYI, Rands link above doesnt work because it has a period at the end. Just remove that to see the source page hes pointing to.[Mod: Thanks Jill, corrected the link above]