Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

After Linkscape launched, some people didn’t want to be included in it. Eventually, we got information posted for exclusion. But then there was a debate on whether that will exclude you or not. So, a discussion about how it works.
Comments110 Comments  

Comments

Avatar Administrator
from dannysullivan 2253 Days ago #
Votes: 6

SEOmoz has a new page up on blocking its spider:<div>http://www.seomoz.org/linkscape/help/sourcesThat page has a specific instruction for the meta robots tag. There is no specific instruction for a robots.txt file equivalent, unless "seomoz" is the agent name that’s supposed to be used there. This should be said explicitly, if so.Complicating matters is that a number of other "data sources" are listed, including pulling information out of APIs from Google. This gives the impression that to be out of Linkscape, you’d have to block Google and other search engines.Add to this a debate over whether SEOmoz is even running a crawler:http://sphinn.com/story/79700</div><div><div></div><div>And a debate on whether that debate should have been closed to new comments:http://sphinn.com/story/79980I don’t want to reignite the debate on whether there is no crawler and SEOmoz has marketing issues, etc. about that. I want to specifically get answers to inclusion in the index.Some site owners don’t want to be in the index. They don’t want competitors to look up data about their link structure. That raises these issues with me:1) If SEOmoz is crawling pages on its own to gather some data, can you block those?2) If SEOmoz is listing pages based on its own and third party data, can you choose to opt-out from being listed?3) Should you be able to opt-out of being listed if Linkscape only uses data from third party sources?OK, for me:</div><div>1) You should always be able to opt-out of being spidered. As I commented earlier:http://sphinn.com/story/77000#c55253</div><div></div><div>"Bottom line, if you’re going to crawl the web and to be a good web citizen, then you obey robots.txt blocking. If you don’t, then you’re not a good citizen in my books."So as SEOmoz does seem to be spidering to some degree, they should be providing both commands for meta robots and robots.txt plus explaining what happens if you choose to be excluded, which at minimum I think means they don’t spider you.2) If you agree with 1, then that keeps some data they might gather from showing up. But there are third party sources that are used. I can’t see that you can expect to be not listed (listed, separately from being indexed), if that’s the case. There are several tools, for example, that draw on Yahoo Site Explorer. Do we say those tools shouldn’t opt you out, if Yahoo Site Explorer still lists you? But to the degree Linkscape goes beyond those tools with its own spidering, then the exclusion I suppose my reassure some people.3) If the tool entirely uses third party data, I can’t see how people can ask to be excluded. Heck, I have no ability to request that Hitwise exclude selling competitive data about what people are doing on my site to other people, since they’re gathering that through ISP information. I kind of wish I could, but I guess I’ve come to accept lots of this data is out there in various ways.</div><div></div><div>So for me, bottom line -- if you’re getting spidered, yes, you get an opt-out. No ifs, ands or buts. If SEOmoz wanted to go the extra mile, they could say opting-out would prevent being listed at all, even when using third party data.(sorry for all the bold. something’s gone wonky with our comment system putting that in there!)</div></div>

Avatar
from Skitzzo 2253 Days ago #
Votes: 5

Also of note, it appears from Rand’s statements that even if you use the META tag, it could take 30-60 days for your site to be removed.That seems like an extremely long time to have your data out there if you’ve tried to opt out of being included. Shouldn’t SEOmoz offer a quicker method of removing your sites from their index? Removing it a month or two down the road won’t do much good if everyone has already had access to your data for an extended period of time.

Avatar Administrator
from dannysullivan 2253 Days ago #
Votes: 3

30-60 days wouldn’t be uncommon from some of the major search engines. I think only Google currently offers a faster turnaround if you specifically request it, and that took them nearly 8 years to put into place. From a PR standpoint, sure, removing it faster would be better.

Avatar
from eKstreme 2253 Days ago #
Votes: 8

I have a simple set of questions for SEOmoz to cut through the chatter and get to the bottom of the technical details of the Linkscape bots and index. 1. (Yes/No) Does SEOmoz control computers that contain a web robot that retrieves data for Linkscape? Computers are defined as physical hardware that SEOmoz owns or virtual services that SEOmoz has access to such as Amazon Web Services or hosting accounts? If yes: 1a. Does the web robot retrieve and obey the robots.txt? 1b. If it obeys robots.txt, which User-Agent string does it respond to? 1c. What is the HTTP USER AGENT of the robot? By HTTP USER AGENT, I mean the HTTP header that is sent with each request according to the HTTP 1.0 or 1.1 protocol specifications. 2. (Yes/No) Does SEOmoz get hold of data that robots outside its control retrieve? By outside of its control, I mean robots that are built, run, and maintained by other companies. Getting hold of data can involve using an API, buying the data on disks or retrieving it online. If yes: 2a. Which robots’ data does Linkscape currently use to build its index? I don’t care about the potential for future use, I care about now. 2b. Which robots’ data has Linkscape used in the past since its inception? So straightforward questions that need clear cut simple answers. Thanks,Pierre

Avatar
from Skitzzo 2253 Days ago #
Votes: 0

Danny, if Linkscape is only offering the same type of information that Google or other search engines offer then I guess that’s fine but they perport to be offering much greater insight into link structure etc. If it’s a more powerful tool, doesn’t that carry with it the responsibility to allow people to remove themselves quicker rather than allowing their competition a 1-2 month long window to examine it?Besides, this is an issue SEOmoz should have (and from Rands comments did in fact) thought through prior to release. If there’s no quicker way to remove yourself, and there was no pre-launch opt out option, it seems like removing yourself now is probably not going to do much good.

Avatar Administrator
from dannysullivan 2253 Days ago #
Votes: 0

Mike had asked if I went through what Rand said in a post on his site:http://smackdown.blogsblogsblogs.com/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/I had, but I’ll highlight points here:+ SEOmoz will treat noindex also as nofollow -- it won’t follow links on pages that you’ve blocked from being indexed."And Andy - we treat meta nofollow the same way the engines do - they don’t appear in our link graph or any of the calculations. I’ll ask the guys to add that to the information page."+ Blocking Linkscape appears to make it not list your page at all, even using data from third party sources."If you read our sources pages, you can see exactly how to block Linkscape from listing your sites/pages without blocking any of the engines. We obey the robots meta noindex tag, but also an seomoz noindex if you just want to target our index."Looking back at Linkscape’s sources page to understand more about this, it says:"The best way to restrict data from all of Linkscape’s data sources is with the Robots tag. Linkscape obeys either “ROBOTS” or “SEOMOZ” in the meta tag’s “name” attribute."I read that as saying if you block a page from being spidered, it won’t appear at all -- not even using third party data. But I think it could be more clearly said, if that’s the case. And if that’s the case, I wouldn’t get into listing all these other data sources at all. You don’t need to know about blocking the, not via Linkscape, if by blocking Linkscape, you’re not going to show there at all. It just confuses matters.

Avatar
from DaveN 2253 Days ago #
Votes: 11

hmmm linkscape or linkscrape that is the question

Avatar Moderator
from Jill 2253 Days ago #
Votes: 9

"The best way to restrict data from all of Linkscape’s data sources is with the Robots tag. Linkscape obeys either “ROBOTS” or “SEOMOZ” in the meta tag’s “name” attribute."Which means every page of a site would need to add special robots tag, I take it. (Kinda like when Microsoft made us add one for their stupid smart tags) I think most would rather be able to add a simple line in a website’s robots.txt file like this:User-agent: SEOMOZ Disallow: /  

Avatar
from randfish 2253 Days ago #
Votes: -4

OK - Long list of questions to answer and I’ll try to address them all.@Danny1) If SEOmoz is crawling pages on its own to gather some data, can you block those?You can block the crawlers that power Linkscape’s index. They are all listed on the sources page - http://www.seomoz.org/linkscape/help/sources. In order to be protective of our competitive intelligence and to dissuade folks from blocking our bots, we will pull from multiple sources. I know this is frustrating to some, but in order to build the best product possible, we need to have an index as close in approximation to the major engines as possible.2) If SEOmoz is listing pages based on its own and third party data, can you choose to opt-out from being listed?Yes you can - no matter where we get data from, you can use the meta seomoz noindex tag to say "don’t show my URL in your results" and we will respect that. It does take 30-60 days to update, as we need to re-crawl and re-integrate with our index (just as the major search engines do, though they’re typically faster).3) Should you be able to opt-out of being listed if Linkscape only uses data from third party sources?Yes, even if we did only pull your data or your URL from third-parties, you can still prevent being listed by using the meta seomoz noindex tag mentioned.@Pierre1) Yes, SEOmoz controls machines that host our data, crawl and process to calculate the link metrics.1a - Yes - any robot we use or any third party source we pull from respects and obeys robots.txt1b - The UAs are listed on the individual sources’ websites and all sources we currently use are on our sources page - http://www.seomoz.org/linkscape/help/sources - you can block these individually or en masse.1c - Again, these are listed on the individual pages for the sources, so you can see them publicly.2) Yes - we may, now or in the future, pull from third party data sources for data that either becomes inaccessible to us in other ways or is more economical to gather from third parties. Again, all the third parties we might use are listed on the sources page.2a - We’re not revealing this.2b - Also not revealing this. They’re both competitive intel. Sorry.Any other questions, just let me know and I’ll be happy to answer.@Jill - currently, we’re supporting just the meta robots tag for a variety of reasons, including that if we get any data from third parties that have crawled, we need to know on an individual basis whether those sources should/shouldn’t be listed. It’s also hard for many site owners with subdomains or sub-hosted blog/CMS accounts to add a robots.txt, so the page level makes good sense there, too. We might revisit this decision in the future, though.

Avatar Administrator
from dannysullivan 2253 Days ago #
Votes: 4

Thanks, Rand. The answer to 2 pretty much solves it. Block you, and you’re out Linkscape, period. If that’s the case, like I said, saying to block all these other spiders is confusing.But....How can people block you specifically through robots.txt? It’s a pain for some people to be tagging each individual page. Is "seomoz" the user agent they can use as Jill suggests?And...Are you sending out any spiders of your own. It has sounded like you are. And aside from listing issues, some people don’t want to be spidered at all just for bandwidth reasons. Typically, this means they can request being blocked through robots.txt (a single request rather than hitting each page).

Avatar
from eKstreme 2253 Days ago #
Votes: 6

No no, Rand. Question 1 was about the robots that you control, not third party sources. And for a robot, it’s a piece of code that uses your bandwidth to download data from websites onto your computers. Not third party robots, but SEOmoz robots.If you have a piece of code that goes through pages that you already have (an index, a set of files, cached copies, whatever you want to call them), whether you downloaded them or got them from 3rd party sources, then that is NOT a crawler. It’s a parser. That’s not a symantic difference but an important technical fact that needs to be stated clearly. A crawler is only one component of the system that retrieves (crawls) pages from the internet, stores them, analyzes them, and calculates link metrics.As for 1b, the link you provide does not show any UA for a robot that SEOmoz owns and so you do not, according to the page, own a crawler. This is in contradiction to the Yes answer you have to question 1. And I find it very ironic (if also a touch rude) that a tool sold as a competitive intel tool is being secretive for competitive intel reasons.Pierre

Avatar
from randfish 2253 Days ago #
Votes: -6

@DannyYes, block us with the meta robots or seomoz noindex and you’re out of Linkscape. We will still show links to that URL, but we won’t list that URL in our results.People can block the UAs that collect data for Linkscape, which are listed on the sources page. We’re not revealing which ones are currently active, but you should expect that any of those could be used to acquire crawl data (and not just by us).Spiders of our own - there are no spiders crawling under the name "SEOmoz" or "Linkscape" and all of the spiders that do crawl for us are listed in those sources. They can all be blocked through robots.txt.@PierreRobots we control - we do control robots, so the answer remains yes.We have both crawlers that fetch data for us (again, those are listed on the sources page) and parsers and processers to aggregate the data, build the index and calculate the link metrics.1b - I think I have to say no comment on this. As I said, there’s certain information we’re not revealing. However, I don’t believe there’s any contradiction.Irony - Well, pretty much every pay-to-use competitive tool on the web or off does not disclose sources. Have you ever asked Hitwise where they buy their ISP data? Or where Spyfu or KeyCompete get their data? Or where Wordtracker pulls its ISP data? None of them will answer you either. I’m sorry that it’s rude, but it’s how the world works and we’re certainly not substantively different from any of these others.

Avatar
from Feydakin 2253 Days ago #
Votes: 4

So, in other words, you’ve built a way to make money off of basically other people’s content and made it exceptionally difficult to not be a part of it.. At least you don’t have "don’t be evil" as a motto..

Avatar
from randfish 2253 Days ago #
Votes: -1

@Feydakin - I think technically, yes, that’s accurate. We, obviously, don’t think it’s evil. We think it’s valuable data to have and is technically all publicly accessible. We don’t scrape and re-purpose content or sell ads alongside the work you’ve created. We built a tool that we felt was important and useful and we give away some valuable data for free and hold the rest back for paying customers. We’re a for-profit business with investors and employees and payrolls. I think it’s very challenging to make the case that what we’re doing is more "evil" than others selling paid services off competitive intelligence or web crawls.

Avatar Administrator
from dannysullivan 2253 Days ago #
Votes: 7

"We will still show links to that URL, but we won’t list that URL in our results."So I’ve I enter a URL into the report box -- and that URL has been blocked -- will you report nothing for it?But if other pages are linking to that page, and you run a report on some of those, you’ll see they’re linking to a page even if it blocked. Is that correct? It would make sense -- links on the other pages can be seen.As for "the spiders that do crawl for us are listed in those sources." Google’s not crawling for you. You can get some data from the Google API, but it’s probably overkill for people to think they need to block Google to stay out of Linkscape.There’s an implication that you’ve got one of these sources in particular pulling in specialty data for you, working under license but without saying its name. If so, it would be nice to know which one, I suppose. But since blocking you keeps you out of Linkscape period, less a worry, I’d say.I think it’s still an issue that you’re effectively telling people they have to insert a meta tag on each and every page they don’t want listed in Linkscape. Since you’ve talked so much about having crawled and built an index of the web -- regardless how who you’ve leveraged to do this -- I’d hope you’d offer an easier way to opt-out an entire site.I’m guessing that if you’re not really crawling, then you don’t have access to the contents of robots.txt files, so you can’t see the blocking this way. And lacking that, you then need a way to verify that a particular domain really wants to be blocked -- which is yet a new system to setup. It would certainly be easier to spider the robots.txt files themselves.

Avatar
from eKstreme 2253 Days ago #
Votes: 2

Here is another yes/no question:Would using a meta tag to state noarchive, as specified at http://www.google.com/support/webmasters/bin/answer.py?answer=35306 , also remove our pages from Linkscape?Pierre

Avatar
from Feydakin 2252 Days ago #
Votes: 1

@rand, I guess that my whole impression of this is that you have just created an entirely new generation of script kiddies and using the excuse of ’we need to turn a profit’ to do it.. We’ve seen this in several iterations over the last couple of decades where people with skill and knowledge work hard, for good or evil, at what they do.. Then a couple of people come along and offer up a script based on that hard work and knowledge that ’anyone’ can use for just a few bucks.. Site Explorer does this to some extent, but I think that you have created the ultimate spam generator for people who link.. I suspect that we will start seeing more spammers using this tool than marketers as they dig for those ’juicy links’ to try to grab.. I don’t see a way to keep the service from being abused by spammers, but expecting people to add a meta tag to every single page that they control seems a lot excessive to me.. I can see a simple verify session like Google WMT uses as far more effective and easier to do for the people who don’t want their sites in Linkscape.. But, if important sites opt out it would certainly effect the value of the service..

Avatar
from DazzlinDonna 2252 Days ago #
Votes: 24

My interpretation: The list of bots provided is really just a list to hide something within it.  i.e. The bot that SEOmoz "controls" is one of those listed, but they won’t disclose which one (*cough* dotbot *cough*).  The rest are there because they also use their data to create the index, but they don’t control the googlebot (obviously) or the yahoo bot (obviously), although the impression is given that they do.  So, the end result is a data set created by combining the seomoz/dotbot crawl data with that obtained (free or paid) from all the other sources.  By using the page-level meta tags that Rand has provided, we can opt out of having our data served up to competitors.  In order to prevent the bot from actually spidering us in order to save bandwidth, etc., we’d either have to block all the ones in the list (yah, right, you gonna block them all? i think not), or you have to know which one is the one "controlled" by seomoz (*cough* dotbot *cough*).   That’s my interpretation of the story, whether or not anyone wants to refute it.  My mind is fairly made up that this is darn close to the truth, so I doubt any more fancy-pants talking would change my mind.  It’s up to you all to make up your own minds.  Now, as for that whole dotbot number of sites spidered subterfuge, that’s a nuther ball of wax clogging up my ears - http://sphinn.com/story/80051

Avatar
from annie7 2252 Days ago #
Votes: 9

"We will still show links to that URL, but we won’t list that URL in our results."So I’ve I enter a URL into the report box -- and that URL has been blocked -- will you report nothing for it?I am sorry, I have to get back to basics to clarify this as I’ve feeling I am going mad :)Blocking linkscape - what exactly does this mean? Does it make linkscape (no matter what the crawler is) disregard links going from this URL (which I feel would be logical)? Or does it prevent the tool from listing backlinkf to this URL?

Avatar
from robwatts 2252 Days ago #
Votes: 17

Rand, whatever way you cut it, you’ve lost out on a whole bunch of trust here. I won’t rehash what anyone has said already as it’s all been said. Edited faqs, cached versions that support a contrary opinion and more.People in this game aren’t mugs, yet it would appear that from what has been pushed back and forth in the whole ’he said she said’ ding dong is that a few people have formed a view that you or SEOmoz took them for such and tried to paint a picture that a thing was something that it wasn’t. Hindsight and all that, but if you ever recrunch anything else out there and offer insights based on this or that, then maye you should just say that next time. I can’t help but feel that no one would have been up in arms or cared even, (excepting dotbot and few other data providers perhaps) Right now you guys look bad and will probably be tarnished by this for some time to come (at least in this community).

Avatar
from netmeg 2252 Days ago #
Votes: 10

It looks like the bottom line here is that in order to keep the 400 or so sites (some with tens of thousands of urls) belonging to me and my company and my clients out of Linkscape, it’s going to cost me one hell of a lot of time, and therefore, money (as time equals money in our business).  Adding that many meta robots tags is just not practical.  All I want is to be left alone to do my business and my client’s business, and suddenly this gets dumped in our laps.  I object to this on so many levels I don’t even know where to start (and most of them have been covered various places)  Thanks a lot.

Avatar
from Vingold 2252 Days ago #
Votes: 8

I was trying to stay out of this, but as Annie pointed out there seems to be a bit of ambiguity going on. And I just want to be clear.If I put the following at the top of every html file on www.vinnygoldsmith.com:META NAME="SEOMOZ" CONTENT="NOINDEX" What, if anything, will people be able to see about my site?  Links in?  Links out?  MozRank?  Nothing at all?I really thought it would remove it all together and people wouldn’t be able to see anything about it, but some of the words you’re using "we will still show links to that URL" are giving me pause.

Avatar
from Skitzzo 2252 Days ago #
Votes: 3

Ok, trying to slice through all of Rand’s answers am I correct in saying that1) SEOmoz does not OWN the spiders that are gathering this data.2) the only way to block the bot’s that pull Linkscape’s data is to block ALL of the bots listed on that page (which includes Yahoo and Google)??Also, if I use the Meta tag that keeps my info from displaying, will you still be using it for other reports (as in sites that I link to etc)?It sounds to me like you reluctantly provided a way to stop your information from displaying because the community demanded it, but you’re unwilling to provide a clear and concise way of keeping Linkscape from obtaining the data and storing it for your own purposes later.Is this a fair assessment?

Avatar
from NickWilsdon 2252 Days ago #
Votes: 6

@JillYes that would be the more usual way of doing things but Linkscape hasn’t actually got any spiders (AFAIK?) that means they don’t control the crawling - they buy the data these crawling companies produce. However you can use that approach on each of the known spiders collecting data and then selling it to Linkscape. IncrediBILL has a list of them. I’ll try and get something on this written up later. That of course won’t be guaranteed to keep you out of Linkscape, as they could still include you from their API calls at Google/Yahoo/MSN and you’re hardly likely to block those spiders. That is why Rand has suggested this META tag, as it’s the only way they can guarantee your site will be excluded. The scraped data they buy will contain this META and they will filter for it at their end. Depending on their arrangement with these companies to buy their data, this could take 30-60 days. They have to wait for these companies to scrape your site with the new META on there and the data to be entered into Linkscape. They can’t take you out before this point. I assume though once this META "flag" has come in from one source or another it will trigger the site exclusion.Correct me if I’m wrong Rand?

Avatar
from Feydakin 2252 Days ago #
Votes: 0

Which is why I suggested an opt out option right on the linkscrape site.. It’s easy to verify site ownership just like Google does with WMT.. Add a simple file that SEOMoz can look at after you request exclusion.. Once site ownership is verified, bounce the data for that site that he already has and not include it in the future.. 1 small text file per website and the process cantake minutes instead of months.. But, of course, if that happened I would love to see the stats for how many people actually opt out..

Avatar
from johnandrews 2252 Days ago #
Votes: 4

Am I the only one who feels really icky about the idea of branding my web pages with an SEOMOZ meta tag? I’ll never do that http://sphinn.com/story/80172@feydakin "claiming" websites with SEOMOZ gives them even more sensitive data... no way.

Avatar
from NickWilsdon 2252 Days ago #
Votes: 4

@johnandrewsI feel the same. It’s work to implement and unnecessary code bloat. I also don’t like the way it labels my site. If we’re talking about "flags" which Google, Yahoo, MSN and even other marketers use to identify SEOs, well this is a doozy.  

Avatar
from randfish 2252 Days ago #
Votes: -4

Sorry for the delay - on a conference call. Again, I’ll try to answer all the items one by one:@dannyWe’re just like the search engines in this respect. If you say "noindex" we won’t show you in our results, but we’ll still calculate link metrics based on who you link to or how. If you do a search on a URL that’s been blocked, we won’t show any information like page title or content, but we will show links that point to it. If you search for a page that’s been linked to by a page that has noindex (either for all robots or for seomoz) we won’t list that page in our results.As far as the implication and the specific sources that are controlled by SEOmoz, we’re not revealing that information now.@PierreWe don’t currently have handling for noarchive one way or another, but since we don’t show any page content, I’m not sure exactly how we might. If you’ve got suggestions, we’re definitely open to them.@FeydakinWe think this will help far more in identifying manipulative links and reporting them to the engines than it will to find mainpulative link sources. Certainly, our impression from the engineers present at SMX East was that they think this can be a good tool for ID’ing spam.@Donna - no comment officially, but I think your thinking is very smart.@Ann - disallowing SEOmoz means the same thing it does in Google. They treat noindex as "noindex, follow" - meaning they follow the links and use it in their link graph, but won’t show it in their results. We do the same thing. If you don’t want us following links, you can use nofollow, just as with the engines.@Rob - Certainly sorry to lose your trust. I think it’s the price for not being as prepared as we could have been on messaging and on building a tool that goes against the interests of some webmasters and SEOs. Hopefully, over time, we can regain that trust by continuing to provide valuable content and tools to the community.@Vingold - if you block every page, no one will ever see your pages in the list of link results we show, just like if you block Google, no one would ever see your pages in their list of search results.@Skitzzo - no comment on spider ownership, but with blocking, we may pull from other sources if we can’t retrieve the data. And yes, we’re unwilling to provide a clear concise way to keep data out of Linkscape (other than the meta seomoz noindex tag), both for competitive reasons and to make the data set the best it can be.@NickWilsdon - Exactly right on the blocking. If you block some of the bots, we may pull from other sources to make our index as comprehensive as possible. The meta tag is the best way to tell us you don’t want to be included. And yes, it could take 30-60 days based on how fast our crawl gets to you and gets processed.

Avatar
from robwatts 2252 Days ago #
Votes: 3

@feydakin Great idea - a simple opt in solution for those who are concerned. Maybe SEOmoz could pay people for their time in implementing too :DIn terms of people who’d opt out, the truth is probably very near to not very many at all, at least in the grander scheme of things.This little debate is a but a microcosm of a wider webmaster community, the majority of whom are in a state of ignorant bliss, neither aware of, or giving a stuff!If Google had tried to play it smart in their early days in a similar style, its debatable whether they’d be the force they are today. Google manufactured consent by consensus and meaningful enagagement, this whole process has been neither.@rand It’s a cool little tool, hats off and all that, gr8 work - just remember we are smart guys and gals too. It isn’t so difficult to be straight down the line, frankness and a hands up we dropped the ball would have more than sufficed. Anyways, life goes on and it’s short guys, remember that too ;)

Avatar
from peteyoung 2252 Days ago #
Votes: 3

Got to say John, no your not the only one - and unfortunately I would suggest many commercial organisations may agree with you. Surely on that basis a sitewide command has to be the only viable option moving forward (as Netmeg mentioned), as this seems reactive rather than preactive activity with little or no commercial benefit (from a client perspective) - and on that basis no commercial entity is going to want to pay for activity to ’manage’.

Avatar
from Skitzzo 2252 Days ago #
Votes: 13

Rand, your response to my question pretty much sums everything up in my mind.You’re not willing to say whether you own the bots the crawl the data or not. This despite several claims on your website and countless comments here on Sphinn. That leads me to believe that A) you don’t own the bots but want it to seem like your project is bigger than it really is. The only reason I can see to do that is to try and match the marketing of your project which certainly at this point seems to have been filled with misdirection if not blatant lies.or B) You do own a bot (dotbot) and you don’t want everyone blocking it so that you can compile their data and sell it back to them. This would make it cheaper for you to run Linkscape since you wouldn’t need to buy as much data from the third part sources.Either way, I don’t think it paints your company in a good light.Also, you admit that you’re unwilling to provide a clear and concise way for webmasters to keep their data out of Linkscape. Can you explain to me what other reputable company does that? As mentioned, every other crawler, indes, or archive that I’ve ever come across has made it quite clear and easy to "opt out" so to speak.I’m sorry, but your two line comment to me speaks volumes and signifies to me that SEOmoz no longer cares about the community that they purport to serve, and now only care about the bottom line. That’s fine, you’re a for profit company with investors, but you can go ahead and drop the BS about transparency and serving the community.

Avatar
from Feydakin 2252 Days ago #
Votes: 1

Something else I just realized.. If it can take 60 days to get "opted out" by adding a meta tag, doesn’t that make this data sort of stale anyway?? I’m not sure how much real value there is in stale data.. Maybe a lot, maybe not so much.. I just think that the whole launch and maybe even the idea was poorly thought out from a reputation POV.. The beginners and scripters will love it, but I suspect it will cause lingering problems with people more established that are tired of seeing people find ways to make money from their work..

Avatar
from WilliamC 2252 Days ago #
Votes: 16

After reading this entire page, I have to wonder if Rand realizes that he is basically alienating  every non-spamming webmaster that pays any attention at all.

Avatar
from DazzlinDonna 2252 Days ago #
Votes: 12

Ok, so Rand has basically admitted I am right about this comment that I made above:My interpretation: The list of bots provided is really just a list to hide something within it.  i.e. The bot that SEOmoz "controls" is one of those listed, but they won’t disclose which one (*cough* dotbot *cough*).  The rest are there because they also use their data to create the index, but they don’t control the googlebot (obviously) or the yahoo bot (obviously), although the impression is given that they do.  So, the end result is a data set created by combining the seomoz/dotbot crawl data with that obtained (free or paid) from all the other sources.  By using the page-level meta tags that Rand has provided, we can opt out of having our data served up to competitors.  In order to prevent the bot from actually spidering us in order to save bandwidth, etc., we’d either have to block all the ones in the list (yah, right, you gonna block them all? i think not), or you have to know which one is the one "controlled" by seomoz (*cough* dotbot *cough*).   That’s my interpretation of the story, whether or not anyone wants to refute it.  My mind is fairly made up that this is darn close to the truth, so I doubt any more fancy-pants talking would change my mind.  It’s up to you all to make up your own minds.  Now, as for that whole dotbot number of sites spidered subterfuge, that’s a nuther ball of wax clogging up my ears - http://sphinn.com/story/80051Since he said, "@Donna - no comment officially, but I think your thinking is very smart."So...that should pretty much answer most of the questions I would think.  Well, except that ear-clogging one.

Avatar
from viassana 2252 Days ago #
Votes: 0

Rand,  What is the value proposition to your Linkscape customers?1. They don’t know exactly what they’re buying (because some of it is secret.)2. They’re asked to trust SEOMoz (which won’ go over that well.)3. There is a hidden investment on the part of your customers to place Meta code and robots.tx, which still may or may not present other concerns for your customers.4. The data is skewed the more often companies run interference with Linkscape, making it less valuable to your customers.I’m concerned that this gem of a device wasn’t tested long enough from the usability side. There has to be ways to satisfy customer requirements for privacy and control, and you still have time to push them in.

Avatar
from randfish 2252 Days ago #
Votes: -3

@Rob - thanks, we think the tool is impressive, and as Nick likes to say, it’s the worst it will ever be right now in this beta stage, so expect it to get much better in features, functionality and coverage over time.@Peteyoung - we’ll definitely talk about a sitewide option. In order for that to happen, though, we’d need to build a site verification service like Google & Live’s Webmaster Tools.@Skittzo - I think that of the two points you outlined, A vs. B, B has by far the better intuition. And I didn’t say we wouldn’t provide a clear concise way to opt out, I said we aren’t providing one EXCEPT for the meta robots/seomoz tag, which is pretty clear and concise.We do care a ton about the community. If we didn’t I wouldn’t be here explaining and responding. I can’t help but resent the accusation after the thousands of hours I’ve poured into giving back to the community. The fact that this tool needs to keep some information private and that portions of it are pay-to-access is a pretty weak argument for suggesting that I don’t care about SEOs and webmasters.@Feydakin - well, Linkscape crawls a significant portion of what we feel needs crawling or is "fresh" every 30 days. We might take longer to reach stale, older stuff that doesn’t update often. In the future, we’ll have greater freshness, but for now, the data is between a few weeks and a couple months old.@WilliamC - I’m not sure where that alienation would come from. Surely you’re aware there are dozens, if not hundreds of bots that crawl the web, don’t announce themselves, cloak as Googlebot and use your data for competitive intelligence that’s shared with no one. Many of them even use it to harvest emails and spam or find blog comments and forums to spam, yet none of these organizations or individuals receive the same degree of criticism or investigation that we have. I don’t think that’s because we’re doing something more evil, but because we’re open about it and willing to share data with a wider community.@Donna - Just be aware that if we can’t pull data through one source, we’ll try to get it in another way, so individual bot blocking doesn’t insure that you’ll be excluded. The meta seomoz noindex tag will.BTW - What’s the ear-clogging problem? I must have missed that.

Avatar
from randfish 2252 Days ago #
Votes: -3

@viassana - I’ll try to respond to all your points:What is the value proposition to your Linkscape customers?Current link data is inaccurate, imprecise and incongruous. Linkscape takes big leaps forward by providing a lot more information about the links we see, calculating metrics for them and organizing them intelligently. You can see a lot about the features and uses for the tool here - http://www.seomoz.org/blog/announcing-seomozs-index-of-the-web-and-the-launch-of-our-linkscape-tool and here - http://www.searchenginejournal.com/seomoz-linkscape-new-backlink-checking-tool-reviewed/7826/1. They don’t know exactly what they’re buying (because some of it is secret.)They know exactly what they’re buying - a tool that leverages an index of the WWW to create a link graph. The only secrecy is around how the crawl was obtained/acquired. It’s sort of like buying a chair and suggesting that because the assembler doesn’t reveal all their parts suppliers, you can’t rely on the chair. Just sit in it, use it, read reviews about it and you’ll see whether or not the chair is good. Same goes for Linkscape - we think the data and the metrics are really valuable, and many others who’ve used it do as well. Over time, the data will get better, the metrics will improve and the index will incraese in size and value.2. They’re asked to trust SEOMoz (which won’ go over that well.)I’d dispute your first and second point. If you don’t trust SEOmoz, you don’t really need to. If you use the free data, it costs you nothing and if you use the paid data and don’t like it, you can get a full refund anytime. I’d start by being distrustful of the data, using it and judging it based on the value it brings you. For some folks, this information won’t prove useful, but for others, it will be immensely valuable.3. There is a hidden investment on the part of your customers to place Meta code and robots.tx, which still may or may not present other concerns for your customers.Our customers don’t need to place any code on their sites unless they want to exclude themselves from being available in the link search results. I’m not sure what other concerns you’re referring to.4. The data is skewed the more often companies run interference with Linkscape, making it less valuable to your customers.Right, which is why we may pull from other data sources if the ones we control and run can’t access link graph information we need. The crawl is designed to be as similar as possible (at least, in the long run) with the major search engines, so we’ll reach far and wide to build that index, which makes it more valuable for our customers.

Avatar
from NickWilsdon 2252 Days ago #
Votes: 1

@Rand>In order for that to happen, though, we’d need to build a site verification service like Google & Live’s Webmaster Tools.That’s actually not very hard, my team did that for SocialBlogroll.com. Users can claim their blog, create a custom .html file with the code in the filename and then verify in the system.

Avatar
from DarkMatter 2252 Days ago #
Votes: -1

must be a pretty good tool for people to be in such a panic about it.

Avatar
from Vingold 2252 Days ago #
Votes: 4

"In order for that to happen, though, we’d need to build a site verification service like Google & Live’s Webmaster Tools."Honestly, after you’ve indexed the web, building something like a site verification program to allow people opt out - should be a relatively easy project.I don’t know how much of my websites’ linking relationships should be private as opposed to public, especially since it can be argued that is already available to whoever has the resources, energy and inclination to go looking for it.  But I am thinking that unless I am getting a benefit from it - I probably don’t want it to be readily available to my competitors.

Avatar
from randfish 2252 Days ago #
Votes: 1

Yeah - site verification shouldn’t be terribly hard, but we have a long dev timeline already, so that work is going to be at least a few months away. Incredibill had suggested that we give webmasters our full advanced report for their own sites, and I think that’s a great idea, both from a giving back perspective and from a marketing one, so along with the opt out, we’ll probably do something like Google Webmaster Tools, where getting backlink data on your own site will eventually be free. Again, it’s months away, but from initial reactions here on the engineering team, seems like something that’s both do-able and in everyone’s best interests.

Avatar
from Feydakin 2252 Days ago #
Votes: 4

@randwell, Linkscape crawls a significant portion of what we feel needs crawling or is "fresh" every 30 days.Now I’m confused again.. So you "do" have a SEOMoz bot that you control and can use to crawl with.. Or, are you using the royal "we" as in we (SEOMoz) use someone else’s bot (DotBot) but since we pay for it we call it ours?? I’m not sure where that alienation would come from. Surely you’re aware there are dozens, if not hundreds of bots that crawl the web, don’t announce themselves, cloak as Googlebot and use your data for competitive intelligence that’s shared with no one.But how many of those are considered leaders in the SEO community and how many are consider leaches?? They all receive far worse criticism, and when found out have their bot blocked and filtered.. Bill is a lot of fun to watch when he does this and it’s one of the main reasons I follow his blog.. Just be aware that if we can’t pull data through one source, we’ll try to get it in another wayNot sure that anything else needs to be said after that comment..

Avatar
from jimbeetle 2252 Days ago #
Votes: 4

@rand: we’ll definitely talk about a sitewide option. In order for that to happen, though, we’d need to build a site verification service like Google & Live’s Webmaster ToolsNot really, if major bots can use robots.txt for sitemap discovery I don’t see reason it can’t be used for building an index from different sources. Give us a unique UA, whatever it might be, make one up. Then retrieve, read and obey robots.txt. That will put the decision to opt in or not in webmaster’s hands without having to pin an extra tag on each and every page. It will be a bit more work on your end, but that’s seems to be where the effort belongs.

Avatar
from Halfdeck 2252 Days ago #
Votes: 6

Since a bot can’t see a META tag without retrieving the page, META ROBOTS requirement basically means Linkscape will burn your bandwidth IF Linkscape in fact had bots that crawled billions of webpages.<div></div><div></div><div>Assuming Donna is on the mark, SEOMoz has a bot that does a partial web crawl, but even that being true, it sounds like Linkscape doesn’t have bots that are capable of crawling the entire web. If it did you wouldn’t need Gigablast, Amazon, Alexa, Google, Yahoo, MSN, etc.</div><div></div><div></div><div>So while what Rand’s been saying may not be necessariy technically untrue, I still see unreconsilable differences between the impression Rand initially gave about Linkscape and reality.</div><div></div><div></div><div>Secondly, I’m glad Rand’s honest about his reason for trying to dissuade people from opting out of Linkscape, but its a bit like hiding a membership cancel link deep in the footer to make people jump through hoops to cancel their membership.</div>

Avatar
from Mert 2252 Days ago #
Votes: 2

Danny, Thank you for letting us use this outlet to finish the conversationJust be aware that if we can’t pull data through one source, we’ll try to get it in another wayRand,No SEO will ever brand their site with the seomoz noindex tag. You told us a long time ago that people talk with their money to make a strong statement; so the above statement just pretty much made sure that any SEO agency I deal with in Chicago just spent their last dollar with SEOMoz. Wow, that was cold (even colder than Chicago). I am speechless.

Avatar
from randfish 2252 Days ago #
Votes: 0

@feydakin - I’ll continue to say "our crawlers" and "our crawl" to refer to the spiders/UAs listed on our sources page. Hopefully that will make this clearer. We don’t have any bots named "seomoz" or "linkscape."Regarding the leach accusation - I guess it’s about perception and whether you think this data is valuable and important to be available. I think we have differing opinions on that, and so we’re not going to reach the same conclusions.@jimbeetle - we list all the data sources we use, and if you’d like to block any or all of them, they all have UAs. Creating a UA that doesn’t exist would mean that only crawl sources we fully controlled could obey it, while the meta tag means that all data collected from any source could be blocked.

Avatar
from randfish 2252 Days ago #
Votes: -2

@halfdeck - I’m sorry if the impression given was off the mark. I think the only real area that could be said is in terms of how we get our crawl. At first, we were completely quiet about the sources, and now we’re revealing them, though not providing very specific details.@mert - I’m really sorry to hear that. I personally feel the opposite way - that this data and information should be accessible and that a tool like Linkscape needs to exist. If you’d like to opt out of inclusion in Linkscape, we’ve provided a way with a meta tag and may offer more robust ways with site verification and blocking in the future.My statement that we’ll grab data we need to be representational of the major search engines’ indices is accurate. We might use any of the sources listed on that page to build our index and link graph, and I’ve been very upfront about that since we revealed those sources. It’s in the interests of our customers and anyone using our data that it be as reliable and complete as possible. I’m not sure why this would make you want to stop using the data or buying it from us, but it’s certainly your decision (though it does sadden me). If there’s something we can do to earn back your business without violating the value or integrity of our dataset, please let me know.

Avatar
from Mert 2252 Days ago #
Votes: 4

@Rand, This is not personally against you. This is a message to the SEOMoz company, which includes the VCs that you are dealing with. A company’s message to their main client base (which in this case are SEOs) is important. You might assume there is a monopoly of data here. No there is not. This is not about leeching or any other issue you have. It is simply the matter of do I listen to my customer base. No I do not. If the client base is not listened to; then there is one simple action the client has to do. That is to terminate business relationship and move on to greener pastures. You are a friend but there is no other way to relay a message any other way given the intense communication that has eaten enough Sphinn bandwidth. Once SEOMoz fixes its stance, then there is a second chance for a trust. Thank you for your friendship.

Avatar
from jimbeetle 2252 Days ago #
Votes: 2

@rand - Creating a UA that doesn’t exist would mean that only crawl sources we fully controlled could obey it, while the meta tag means that all data collected from any source could be blocked.Guess I didn’t express myself very well. I didn’t mean to imply that the robots.txt entry would deny crawling to any bots; I basically meant for you to use it on your end the same as you would the meta tag. As you compile your index if the UA is included as blocked in a site’s robots.txt, then treat it the same way you would have if it were the meta. Basically, use the robots.txt entry as a substitute for the seomoz noindex. This would make it much easier for folks who want to opt out.

Avatar Moderator
from Sebastian 2252 Days ago #
Votes: 2

From a technical POV:NOINDEX (provided by a HTTP header or META tag) is not suitable to keep me out, because LinkScape needs to recrawl the page to see this directive, and that can be in 30 or 60 days or never.As long as  there’s no crawler directive obeyed by Google, Yahoo, and all other service providers that do the actual crawling, and no timely refetch of each and every page/domain requested by any user, there’s no working way to opt out.Correct?

Avatar
from Feydakin 2252 Days ago #
Votes: 2

@rand, I never said "you" (SEOMoz et al) were leaches.. You are the one that lumped yourself in with those people by comparing what you are doing to what they do.. What I did say was that you have a certain reputation in the industry and they don’t.. So yes, you are expected to do better than a leach and a scraper..

Avatar
from Skitzzo 2252 Days ago #
Votes: 4

Rand, And I didn’t say we wouldn’t provide a clear concise way to opt out, I said we aren’t providing one EXCEPT for the meta robots/seomoz tag, which is pretty clear and concise.The problems with the meta tag have been outlined fairly well in this thread and others. Also, it doesn’t keep you from archiving my sites’ information and using it in the future or even selling it to someone else to use.I realize that there are other bots out there that do this and disquise themselves etc as you mentioned but if you’re only trying to be better than the scrapers, then that’s setting the bar pretty low don’t you think?We do care a ton about the community. If we didn’t I wouldn’t be here explaining and responding.Rand, the people here at Sphinn are your target audience. This is PR and damage control, this isn’t about giving back to the community.The fact that this tool needs to keep some information private and that portions of it are pay-to-access is a pretty weak argument for suggesting that I don’t care about SEOs and webmasters.I actually never used the free vs. paid issue to suggest that you don’t care about the community. I used your own statement (and I quote) "we’re unwilling to provide a clear concise way to keep data out of Linkscape." And, just so you don’t try to use the meta tag defense again, let me emphasize the meta tag will NOT keep my data out of Linkscape, it will keep it from DISPLAYING in Linkscape.Once again, please share with me what other reputable websites or businesses milk data from sites to sell and don’t offer a way to prevent that data GATHERING (not displaying). I’m all ears.

Avatar
from randfish 2252 Days ago #
Votes: 0

@Mert - Many thanks to you, too. I think when friends can disagree and have a discussion on the merits, it makes better products, better companies and better people.I think SEOmoz is listenting to our customer base - we’re trying to provide the most comprehensive, valuable product possible and to do that, we pull data in different ways from different sources. We provide a solid way to opt out - through the meta tag - and we give disclosure on the sources we may use now or in the future so you can block those UAs if you’d like. If you feel that the tool doesn’t provide value to you, or that in our attempts to build something valuable, we’ve crossed a moral or ethical boundary, that’s certainly your perogative and decision. I think that since we obey robots.txt and only pull from sources that also do, AND provide a specific method to opt out of being in the results, we’re covering our bases and protecting the interests of our customers and the wider web. However, reasonable people can certainly disagree and that’s why I’m in this thread - to hopefully help answer any questions that arise and provide our perspective on them.

Avatar
from randfish 2252 Days ago #
Votes: 0

@Sebastian - I think that’s incorrect. The meta tag will opt you out, and like the major search engines’ behavior, it will require time until we see that tag and can remove it from our next index update. @Feydakin - I expect better from us as well. I think that we are living up to them, and I know other disagree. I think that’s going to be an opinion issue, and thus one that I can’t further expound upon or explain.@Skitzzo - There may be some problems with the meta tag, but it is a way to clearly, concisely opt out of Linkscape’s results. You are correct, however, in saying that it doesn’t mean we won’t use link information gathered on noindex pages in calculations like mozRank, mozTrust, etc. just as Google/Yahoo!/MSN/Ask do. If you wanted a link removed from the link graph and the calculations, you’d need to use nofollow.

Avatar
from Skitzzo 2252 Days ago #
Votes: 5

Rand, do you honestly believe the meta tag allows you to opt out? Let me illustrate my point a bit more clearly.If I put the meta tag on my page does it keep you from crawling my site? Does it keep you from using my bandwidth? Does it keep you from storing my data? Does it keep you from later selling that data to some other company that doesn’t obey the SEOmoz meta tag?The answer to all of those questions is no. Thus, it’s not a way to opt out. I’m not sure how I can make it any clearer to you.

Avatar
from randfish 2252 Days ago #
Votes: -4

@skittzo - Right - and the same is true of the behaviors of all of the search engines. You can opt out of the listings if you’d like using noindex, but they will still crawl, still use for calculations and still potentially sell/profit from that data.If you want to completely opt out, you can block the bots - and all of these are listed on the sources page - http://www.seomoz.org/linkscape/help/sources. Just as with the engines, who list different UAs they use, we list all the ones we use or might draw from. Block these, and you’re completely out of our index. We’ll continue to list any UAs and sources we find on that page so it remains comprehensive.

Avatar
from Skitzzo 2252 Days ago #
Votes: 5

@randfish Right - and the same is true of the behaviors of all of the search engines. You can opt out of the listings if you’d like using noindex, but they will still crawl, still use for calculations and still potentially sell/profit from that data.The difference is that unlike search engines, you won’t honor my request to be removed. As you said above:Just be aware that if we can’t pull data through one source, we’ll try to get it in another wayOnce I tell one of the SE’s to go pound sand they don’t try to back door me. The realize I don’t want them on my site and they don’t come back (other than to hit the robots.txt file again).

Avatar
from WilliamC 2252 Days ago #
Votes: 7

Rand: Actually, your system itself could obey robots.txt easily enough. Simply create a spider that fetches robots.txt for each unique domain your sources find, and check it for a UA of seomoz and disallow. Simple sitewide opt-out solution.

Avatar
from eKstreme 2252 Days ago #
Votes: 3

Re noarchive:SEOmoz needs to see the HTML of the page, at the very least to see the SEOmoz-specific meta tag, right? That’s one.Two: SEOmoz keeps talking about their crawlers, which are really bots that SEOmoz does not control. You keep talking about about things like dotbot and Y! Slurp and GoogleBot.So if we block dotbot and the rest, we block SEOmoz’s access to our HTML. But I don’t want to block GBot just because of some pesky tool, but I can tell GBot and Y! Slurp not to cache the page with noarchive. I’ll happily block dotbot and other unimportant bots.End result: you don’t have access to our HTML unless you crawl it yourself. And if you do, we’ll find it and block that too.Of course, I can know if I’m right or wrong because you are not giving us straight answers. Don’t be surprised if we react in kind.Pierre

Avatar Moderator
from Sebastian 2252 Days ago #
Votes: 7

Rand, I agree that technically the meta tag is a way to opt out within a few months. However, I do think that’s not acceptable for most site owners who want to prevent their link data from being disclosed by your tool. I’m aware that your current architecture doesn’t allow a timely opt out, but you could provide an opt-out option via a web form that verifies ownership and blacks out link data on request asap, doable in near real time. What do you think?

Avatar
from randfish 2252 Days ago #
Votes: -3

@Skittzo - You make a fair point. In order for us to build an index that accurately represents the major engines, we may pull from multiple sources, and that differs from the engines, who generally rely on a single UA.@WilliamC - That’s possible, too. I’ll talk to our guys to see if it’s something we can implement and if so, when.@Pierre - I’m not quite clear on how you’re suggesting we treat noarchive. As far as bot blocking, as I said, you can certainly block all the bots on the list and we’ll respect that. We’re tenacious about including data, but only to a point. A webmaster dedicated to keeping us out certainly could through the method you described.Also - on providing straight answers. Looking over the responses, I think I’ve been exceptionally clear and direct. While some of those answers are "no comment" or "I can’t talk about that" the rest are as "straight" as I know how to be. If there’s anything where you feel you haven’t gotten a straight answer, please ask again and I’ll respond as best I can.

Avatar
from randfish 2252 Days ago #
Votes: 0

@Sebastian - I think that in the future, that’s definitely achievable to enable quicker removal through verification of site ownership. I don’t know when exactly we could have it baked into the product, but probably not before the end of the year - the devs have a strict timeline with a lot of projects right now. Thanks for the suggestion - we won’t forget about it (putting it in our list of future upgrades today).

Avatar
from Vingold 2252 Days ago #
Votes: 6

Skitzzo said this on the other thread:If something like this happened once or twice, I’d give them the benefit of the doubt but it’s become a pattern of behavior with SEOmoz.1) Controversy2) Benefit from attention3) Apologize but due to time can’t fix it just yet.4) Continue to gain attention.5) Finally fix it and explain how aw shucks guys, I didn’t mean it like THAT. 6) Ask people to talk to you about it in person rather than writing about it in public.Rand, when you say:"site verification shouldn’t be terribly hard, but we have a long dev timeline already, so that work is going to be at least a few months away."Does that mean we’re at #3?99.99% of the time all of the SEO drama means nothing to me, but for some of my sites a lot of work went into the link building, testing and verifying which anchor text works best, etc.  For someone to be able to just come along and copy it - I don’t know, I guess that’s your right to sell it to them, but it doesn’t sit well with me.  It certainly brings up a lot of ethical questions.I was really hoping that opting out would be a lot more cut and dry than putting a meta tag with your company name on every single html file on my websites.The best metaphor I can come up with is this:I invite myself to your family reunion.  Everyone you are related to is there, its a big gathering so I go unnoticed.  I then casually go around to all of your relatives and ask them about their medical history as I piece together a family tree.  I then also go to your doctor’s and bribe them for information about your family’s medical history as well, and I dig up some publicly available info like death certificates to round out my files.Now, once I got all of that I come to you and offer you a complete breakdown of all of your potential medical issues that are based on heredity, genetics, etc.  Believe it or not, even though I got it without your permission, you might be ok with it and maybe even willing to pay for it.  It does have some value after all.But would you want me selling that to someone else?  And you might be upset that I collected it without you knowing.I then come to you and say that I’m not unlike the hospital - they have all of this information but it is hard to put it all together the way they have it. (Google is the hospital in this metaphor).You could argue the hospital is giving you something in return (good health, proper care, etc.) and that you don’t want this information available to just anyone for sale.How would you feel if I said your only option for me not to sell this information is if you put a sign that says "vingold" up in the window of your home?

Avatar
from SamFreedom 2252 Days ago #
Votes: -7

But... but... but can’t we Aaaaaalll just get alonnnng?

Avatar
from eKstreme 2252 Days ago #
Votes: 10

Rand, you hit on the head:We’re tenacious about including data, but only to a point. A webmaster dedicated to keeping us out certainly could through the method you described.I don’t want you on my site and I don’t want to have to fight you off. Linkscape is not being polite because it’s going after my data by hook and by crook and I have to spend precious time to fend it off. It’s no different than an MFA scraper: instead of making money from ads, you’re making money from subscriptions. They’re in the same bucket on that front.Your statement clearly shows to me that you do not want to be polite. All the bots listed on your sources pages are polite but they serve me well: Google and Yahoo send me traffic. What does Linkscape do for me? Nothing good. On the contrary, it wants to hurt me by giving my competitors a leg up.This the heart of the problem: I don’t trust your tool right now. I don’t like the fact you’re not honest about how to effectively and efficiently block it. We don’t have time to fight the bad bots and you. It’s all about politeness and Linkscape is enroaching on my territory, namely my sites. And it has no business being there and I’d like to go away. I’m happy to add something to robots.txt and that’s it. Let me state this again clearly: Linkscape is not welcome on my sites and the onus is on SEOmoz to make it easy for me to keep it away.

Avatar
from randfish 2252 Days ago #
Votes: -9

@vingold - I think the metaphor loses significant relevance because it relies on pulling data that’s not available for public consumption. We’re not going through trash bins or knocking on doors or bribing anyone. We’re culling data that anyone can see on the web and making it accessible in the format many webmasters and SEOs would like to consume it. As for the pattern - I addressed that in an earlier thread, but I think it’s going to continue to be a pattern. We’re not always great at predicting what will arise ire in the community, and since we have very busy lives and jobs, our responses aren’t always as timely as we like, nor can we make rectifications on the fly - time is always going to be in short supply at startups. Although I wish it were different, I’d be dishonest to suggest that we can break out of the cycle you mention. We’ll just have to live with it and hope that over time, things improve.@SamFreedom - I thought we were getting along :-) I’m not in here to attack, belittle or diminish anyone or their opinions. My only goal is to make sure all the questions get answered honestly and thoroughly (even though that means saying, at times, "I can’t disclose that.")@Pierre - Yes, I recognize the position you’re taking and I know that for some folks, it’s upsetting and unsettling. However, we’re not breaking any laws, nor are we doing anything more than what search engines do. Your point is that they give back with traffic. I think that we’re giving back with data and metrics, and we’ll try to give more back over time. And, as I said, if you are extremely upset about it, you can use the meta tag to keep your pages out of our results and block bots to keep us from accessing your data.What do we think? Are almost all of the concerns and questions answered at this point? I have a lot of other responsibilities to attend to at work, and this takes up a significant portion of my time...

Avatar Moderator
from Sebastian 2252 Days ago #
Votes: 1

Rand, thanks for considering my suggestion. What I can’t understand is your timeline. Putting such a heavily-asked-for functionality at the end of the development queue seems pretty unreasonable with regard to the buzz created on the opt-out  thingy here and elewhere.  Of course there might be better solutions and surely I’m not the guy to to tell you how to do your business ...  however, I’m kinda puzzled.

Avatar
from randfish 2252 Days ago #
Votes: -1

@Sebastian - well, we build out dev cycles internally about 3 months, and the upgrades and work that’s being done is critical to business operations and the promises we’ve already made to partners and customers. Trust me - I’d love to get things done faster; I have a million and a half requests, but we have to prioritize, and with a startup budget and a small team of devs, we can’t move quite as quickly as would be ideal.BTW - Not just considering - we’re taking it really seriously. I think it’s a very good way to get more people comfortable with what we’re doing and possibly expanding our marketing by showing off all the data we know about your site (and what you could potentially get on the competition).

Avatar
from Feydakin 2252 Days ago #
Votes: 3

@rand I’m glad to hear that you are at least looking at our suggestion for an easy way out.. I question your timlines as well.. Seems that you don’t have any room for anything that goes wrong in there and we all know that that is a bad way to schedule.. Most production schedules I’ve been involved with had a set percentage of open time to cover just these issues..But, you said you feel that you are giving back by providing that data.. The big caveat there is that you want us to pay you for it and Google sends us the traffic for free.. And the suggestion to block Google to block SEOMoz is just ridiculous.. To keep you out we have to keep out everyone.. If anyone else did this the uproar would be huge.. More huge than what you are seeing here in public at least..

Avatar
from KenJones 2252 Days ago #
Votes: 0

(Ooh look a fence... my favourite place to sit ;-)  ) Seems to me that everyone accepts that Linkscape is offering an incredibly useful way of carrying out competivie analysis without having to spend ages trawling through lots of data sources and building your own tools to compile and present such data (A good thing). Where things get problematic is that this means that such information is also now more readily available to your competitors (Not such a good thing). Strikes me as a case of SEOs wanting to have their cake but not let the competition eat it.  How many of the people here (and elsewhere) who are so vocal about not wanting sites under their control to appear in Linkscape’s dataset will still be happy to mine it for information about their competitor’s sites? One thing that I haven’t seen discussed in great detail (and I’ll freely admit that I’ve not looked that closely, so forgive me if it has been dealt with elsewhere and please drop a link to point me in the right direction) but how much value does Linkscape’s link graph actually provide beyond the world of the Linkscape index itself? As Rand’s old friend and intellectual sparring partner, Michael Martinez, is so fond of reminding people, there’s no point using Yahoo to try to gauge the value of links in Google’s index, so why should Linkscape be any different? Sure it provides a good approximation of the link structure (of a portion) of the web, but mozRank and the other metrics don’t and can’t equate to actual PageRank. I have no doubt that as Linkscape grows (and assuming there isn’t a mass revolt from webmasters the world over blocking it on every page of their sites) it will become a closer approximation, but that’s all it will ever be.  An approximation.  A best guess based on a series of best guesses about value factors for the engines.  Unless Matt Cutts has secretly slipped Rand a copy of the algo and that’s really what’s powering Linkscape, all the tool is really doing is providing people with information that is freely available elsewhere but reducing the amount of legwork required to gather it.

Avatar
from randfish 2252 Days ago #
Votes: 0

@Feydakin - That’s a fair point, and I think that as we get bigger and have more leeway and devs on the team, we can and should build with those in mind. That’s not to say we don’t already. One quick fix we made just this week is to not show links from noindex pages in the tool (which we were initially). We simply hadn’t mimiced the engines’ behaviors accurately, were called out, and fixed it up. We had built in some time for fixes, just not enough to take on projects of this scale (and it does require more significant work than you might think - needs to be tested, has to be applied to subdomains for those hosted on third-party platforms, etc.)@Ken - We have done some testing on mozRank vs. PageRank. On statistically significant samples, we’re about 0.7 off toolbar PR. Looking through the data, those pages where the difference was most significant, e.g. PR2 mR6.75, we almost always found paid links or penalties. Obviously, there’s always going to be big differences between what we show and what the engines do, but I think there’s still tremendous value in comparing, contrasting and working towards a closer approximation over time. We’ve got plenty of time and people devoted to that as well.And no, Matt Cutts didn’t slip me anything :)

Avatar
from KenJones 2252 Days ago #
Votes: 0

@Rand - And no, Matt Cutts didn’t slip me anything :)Yeah like you’d admit it even if he had ;-)

Avatar
from Vingold 2252 Days ago #
Votes: 1

@KenJonesHonestly, I think at a site/domain level the information available is ... ok.  For the sites I looked at (my own, clients and competitors) I don’t think it is anything earth shaking.  I think it would be neat to play around with, and to spy on a few competitors who continually outrank me, but I don’t think I’m willing to pay for that just now.I think the biggest strategic value of Linkscape is the aggregate information it makes available to SEOmoz.  It can provide a pretty good approximation of some pretty nifty statistics.  Some of which Rand has already revealed (% of internal to external links, % of no-follows, etc).  I think the deeper you slice and dice that data the better understanding you have of what a typical link profile might look like for an average website and where the extremes of that profile might be.  That knowledge can be handy for anyone building links either ethically or not, because it will let you know approximately where the search engines’ radar might be.In my opinion, that information alone gives SEOmoz a competitive advantage at the top of this industry.  It will be interesting to watch if other SEO companies follow suit and whether we’ll have a whole host of bots to block using Meta tags.  @Rand I’ve said my peace.  Thanks for listening.

Avatar
from KenJones 2252 Days ago #
Votes: 0

@Vingold - Good point, although I think the "Is it worth the asking price?" debate is one for another time.  Personally, I’m not currently a paid up ProMozzer and Linkscape isn’t enough to sway me to part with the cash to upgrade (although if Rand feels like chucking me a freebie I wouldn’t say no :-)  )You also raise an interesting question about what additional information they may be gathering which isn’t being shared with either free or paid users, but again it won’t be anything that a decent SEO can’t find with a bit of hard graft and a lot of time analysing data.As for other SEO companies gathering similar data.  I’d be very surprised if there aren’t already a number of them doing it.  SEOmoz just happens to be the only one that’s making it publicly available for those without the required resources and budgets to build their own such tools.

Avatar
from Skitzzo 2252 Days ago #
Votes: 7

@randfish - you are now resorting to the fact that you’re not doing anything illegal. You have to see how far that is from being a good "netizen" don’t you?You keep comparing yourself to the search engines but there are two very important differences.1) Search engines provide something back in return. Linkscape does not.2) As I said before, if I tell SEs to beat it, I don’t have to keep fighting them off.As it stands now, you are using my bandwidth and my data for your monetary gain while not only providing me no value, but also probably helping my competition gain a better understanding of my tactics.And, if I decide I don’t want to be included in your data set, I have to block all sorts of different bots, and trust that you’ll maintain your list of other bots you might use. I have to actively monitor your site and adapt to any changes you make. And, if you do happen to oh, I don’t know, use a bot before alerting me, I’ll be stuck in your index for 30-60 days.I’d challenge you to look at the situation from the other side of things (difficult though it may be) and realize that despite what you tried to make people believe, you’re NOT a search engine and you AREN’T behaving like one. Until you make it quick and easy to exclude my data from being GATHERED I’m going to regard SEOmoz as the same type of nuisance I do every other spammer and scraper trying to make a buck of my hard work.Congrats Rand, you’ve turned SEOmoz into the equivalent of a MFA site.

Avatar
from johnandrews 2252 Days ago #
Votes: 5

@sebastian As Rand noted himself above, a "feature" that allowed webmasters to sign in and claim ownership of a website is not better because it gives SEOMOZ more private information which Rand plans to monetize:"I think it’s a very good way to get more people comfortable with what we’re doing and possibly expanding our marketing by showing off all the data we know about your site (and what you could potentially get on the competition)."Above all else SEOMOZ would like everyone to continue to compare them to Google and Yahoo. Those major engines have treaded very lightly in the area of "registering websites" with good reason. The sites that include you and then require you to "claim them" in order to opt out, are traditionally leverage /bullying scams. Even Google wants to act prejudiciously against them (e.g. directories) when it can. Think of what Mechant Circle does, what Rip Off report does, etc. and think of Google’s comments about "free" directories needing to provide FREE means of correcting information, opting out, etc. if they want to be "trusted" by Google.I am amazed no one has seriously dug into the probably terms of service violations associated with this tool. CraigsList promises to charge a fee for every automated access to its site. Last I looked Google and Yahoo APIs restricted commercial use of their data. I am sure Google is amuzed by the flack and doesn’t see any need to comment right now.

Avatar
from randfish 2252 Days ago #
Votes: -4

@Skittzo - I’m going to reject a few of those allegations - I simply don’t think they’re accurate.When I said we weren’t doing anything illegal, it was in the context of defending against the analogy made by Vin that we were bribing doctors for medical information (which is illegal). I think that some people would say we’re aggressive but decent web citizens, while others could argue, as you have, that we’re not and that we should be less aggressive in our pursuit of data.When I look at us from the other side, I see an organization that, like many others, pulls down data about the web. They do so aggressively, and it’s a pain, if I’m a big privacy advocate, to block their activities. Yet, at least it’s easy for me to remove myself from their listings. So, unless I have several hundreds of millions of pages for them to crawl, the bandwidth costs, since they’re only grabbing data once per month, are very low - probably in the pennies or dimes per month category (and in the vast majority of cases, much lower than that). This organization gives back freely an alternate metric to PageRank from Google and tells me how many links they know about to my page and site and how many unique domains are represented in that number. That might be valuable to me, but it might not. All in all, given the tremendous number of worse things they could be doing, I’m probably not particularly upset, but it might piss me off a bit. At least I know them, as opposed to all those other companies and scrapers who do this that I don’t, but that probably doesn’t entirely excuse their behavior.That’s my "outside trying to look in" perspective.As far as the MFA accusation - that’s entirely inaccurate. We serve no one’s content - just URLs, titles and links and we do no advertising on top of that data.

Avatar Administrator
from dannysullivan 2252 Days ago #
Votes: 8

Finally caught back up on this thread. In terms of all the concern about not being listed, that’s pretty much going to happen if you block. As Rand explained above, folks are still going to get a lot of data about your pages -- at least the people pointing at them. Let’s get really, really clear about this -- especially given the amount of discussion on people wanting to be removed. Block your pages, and Linkscape won’t be able to report on what you’re linking out to. But the people linking to you (which is more helpful in my view)? That’s totally still going to be shown. Unfair! Well, Google and Yahoo allow backlink lookups to any URL, even if those URLs are blocked from actual indexing. No one is screaming at Yahoo for giving away "competitive information." Most people point at Yahoo Site Explorer as a super cool tool and bitch at Google for not providing better reports. So don’t want to be listed? You’re going to get listed in large degree regardless, just like you’ll have backlink data listed with Google and Yahoo even if you deny them the ability to spider you.  That seems fair enough. What’s not that cool is the other issue, that there’s a spidering component going on eating up some bandwidth. Not a lot, I know -- but for over 10 years, well behaved spiders have allowed site owners to exclude themselves. When SEOmoz said it had its own spider, fair enough that some figured we should have seen the blocking instructions when the search engine launched, if not before then, through a user agent left in server logs. Now apparently SEOmoz has no spider. If not, then that issue is solved as well. But then Donna suggests, we’ve got this long list potentially to "hide" dotnetdotcom.org as the SEOmoz spider. I’d say if you believe that and are super worried, block it and anything else that looks weird on the list of services SEOmoz gives out.For me, it’s not worth the effort. And I suspect a lot of people concerned might have felt the same way if there wasn’t all this "we can’t say" or unclarity of what’s listed or not.

Avatar
from Skitzzo 2252 Days ago #
Votes: 10

One more post and I’m done.It seems that what we’ve established is that SEOmoz will neither confirm nor deny their ownership of the dotbot crawler, despite their earlier promotional claims of having their own spiders that crawl the web.SEOmoz will aggressively seek out your data no matter your wishes and in fact if you block them one way, they’ll try to get it another way.SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.SEOmoz is "unwilling to provide a clear concise way to keep data out of Linkscape."SEOmoz did not think ahead enough to predict that SEOs would want an identified user agent or method of removing our sites from their index, despite being privacy advocates in the past and spending many years in the SEO community.Those appear to be the facts that we’ve established here, if I’ve missed any please mention them. Suffice to say that’s not the behavior of a company I want to do business with but that is a decision everyone has to make on their own.

Avatar
from DazzlinDonna 2252 Days ago #
Votes: 9

One fact you’ve missed, Skitzzo (the one I earlier called earwax):6.  Whoever owns that darned dotbot is lying about the number of sites it has indexed, as has been established by the uncovering of the 7 billion pages that got instantly added to the javascript.  Oh wait, of course I should probably say "probably lying", because there’s a miniscule chance in hades that those 7 billion pages really did materialize overnight.  Forgive me for using the word "lie".  Sleight of hand might be better. ;)

Avatar
from randfish 2252 Days ago #
Votes: -7

@skitzzo - I’ll try to address all of these points below:1) It seems that what we’ve established is that SEOmoz will neither confirm nor deny their ownership of the dotbot crawler, despite their earlier promotional claims of having their own spiders that crawl the web.This is correct. We don’t talk about the sources for our crawl data beyond providing the comprehensive list at http://www.seomoz.org/linkscape/help/sources. Our claims of having spiders remain accurate and fully truthful - no boasting there. There’s literally no other way to get the data.2) SEOmoz will aggressively seek out your data no matter your wishes and in fact if you block them one way, they’ll try to get it another way.We do have a variety of sources we can pull data from to build our web index, and should we be missing important pieces of the link graph puzzle, we’ll use all the tools available to construct that data accurately.3)SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.Yes, although we are looking at ways to block an entire site from being shown in the future through a registration system. And yes, we can’t block anything until we’ve re-crawled and re-indexed that page, which can take 30-60 days depending on the speed with which we crawl/re-crawl a given URL.4)SEOmoz is "unwilling to provide a clear concise way to keep data out of Linkscape."That’s what you said, and I merely copied it to point out that it had an exception. I know it’s a fun soundbyte, but without the important caveat in the sentence it was in, it’s really unfair to keep using this phrase. That caveat is that we are willing to provide one clear, concise way to keep data out of Linkscape - the seomoz noindex meta tag.5) SEOmoz did not think ahead enough to predict that SEOs would want an identified user agent or method of removing our sites from their index, despite being privacy advocates in the past and spending many years in the SEO community.We didn’t think ahead as carefully as we could or should have, but we’ve now had a way to block us from showing your data since 9 days after launch. Granted, I wish that could have been even faster or even up when we announced, but I’m not sure it warrants the level of criticism you’re assigning. I suppose that’s up to folks who are judging to decide.

Avatar
from Feydakin 2252 Days ago #
Votes: 1

@DannySullivan : In the comparison between Linkscape and Site Explorer, there is one important difference, or rather, $79 of them every month.. I still think that the people that will find this service the most useful are going to be the spammers looking for juicy link targets.. The beginners won’t really understand it and the established people don’t really need it except as a curiosity more than anything.. I bet on the day of launch you weren’t expecting this reaction, were you rand?? :)

Avatar
from jameszol 2252 Days ago #
Votes: 10

@SEOmoz - I think you should share the damned bot and allow robots txt to block it. I hate to say this - and i’ll probably get a bunch of negative ’sphinns’ - but I would be willing to bet that less than 5% of the community is going to actively block SEOmoz bots, scrapers, etc. from their sites. Less than 1% will probably even blog about it and the reach of sharing that technical data will actually be more limited than you think. The rest of the web won’t even know what happened. There are millions of websites and millions of ’webmasters’ - your linkscrape tool will still be relatively accurate given all other tools currently available.You will maintain your marketable edge. I would argue that nothing would really be lost by telling the community about this data. Just give it up already! Even cuil was cool enough to give us their bot info.

Avatar
from mvandemar 2252 Days ago #
Votes: 3

Our claims of having spiders remain accurate and fully truthful - no boasting there. There’s literally no other way to get the data.Of course there is, that is a silly statement. You list 2 commercially available indexes in your sources. It does not require the capacity to spider 30 billion pages to buy an index that someone else has already spidered and offers for sale.What you are claiming is that you did not get any (as in none, zero, zip, zilch) of your data from Yahoo, Google, or any other pre-existing database out there. That’s the boast you made.

Avatar
from SEMSEO 2252 Days ago #
Votes: 3

RandFish says "identifying manipulative links and reporting them to the engines."I thought this job of "identifying manipulative links and reporting them to the engines" supposed to belong to the search engines as it is search engine’s job to determine which links are manipulative and which links are not in their search algorithm. By using this SEOMOZ Linkscrape tool, I am afraid those SEO with this info will report their competitors to the search engines. That is why I don’t like this tool. As as an SEO, we are supposed to build links to optimize our websites and yet some where out there there would be other busybody sourgrape SEOs who will use this tool to report us just because we have more links than them and ranked higher than them.

Avatar
from scottbowler 2252 Days ago #
Votes: 0

Why not have a solution similar to Googles confirmation for Webmaster Tools?If I don’t want my website indexed, I go to Linkscape, put in my domain name and Linkscape generates a unique file name. I create that file on my domain name, click confirm and now Linkscape stops indexing my site.

Avatar
from SEMSEO 2252 Days ago #
Votes: 2

<font color="#186318">Skitzzo</font>  says, "Those appear to be the facts that we’ve established here, if I’ve missed any please mention them. Suffice to say that’s not the behavior of a company I want to do business with but that is a decision everyone has to make on their own."Does that mean <font color="#186318">Skitzzo</font> will no longer subscribe to SEOMOZ?

Avatar
from Dudibob 2252 Days ago #
Votes: 1

As mentioned somewhere earlier in the thread, could we have access to the data on our own sites for free?  Something like that would be an incentive NOT to block SEOMoz.  Otherwise I’m certainly not paying $79 to see what it’s all about and don’t want my competitors to get a clear view of everything about the sites I look after.

Avatar
from Halfdeck 2252 Days ago #
Votes: 4

Even if we tag all our pages with SEOMOZ META, Linkscape will still show our backlinks to competitors unless websites linking in also block their sites from Linkscape. So claiming that SEOMOZ META is a viable solution is misleading. Linkscape is not a search engine. It’s a backlink analysis tool. No one cares that our web pages don’t show in Linkscape results. People want to block their backlinks from showing and that isn’t really possible by just using the SEOMOZ META. Blocking Google, Yahoo, and MSN in robots.txt isn’t an option either. So Rand is claiming a non-solution as not only a solution but the only solution they’re going to offer for the time being. And his defense to that decision is he can’t make money otherwise. <div></div><div></div><div>Bottom line is SEOMoz must offer a way for people to prevent Linkscape from crawling their sites and displaying backlinks to competitors.<div></div><div></div><div>That said, I don’t buy the "competitive advantage" objection. SEOs have complained often about Site Explorer and Google link: command showing only partial data. Linkscape is just a souped up Site Explorer.</div><div></div><div></div><div>What is a problem is unlike Site Explorer, SEOMoz makes a lop-sided value proposition. SEOMoz is leeching off our websites and not only gives back nothing in return but is also prepared to charge $1200/year for us to get something back in return. Is that a fair deal? I don’t care that Linkscape doesn’t show ads. Google doesn’t make me pay money to run queries. With Google/Site Explorer is a fair trade off. Linkscape is a one-way street.</div><div></div><div></div><div>Rand may protest that Linkscape offers valuable link data. For people who are interested in backlink mining 24/7 I suppose that’s true. But for the majority of webmasters on the web, you can’t make that argument. It’s like taxing 95% of Americans and redistributing that money to a select few.</div><div></div><div></div><div>Rand’s claim that Linkscape only will burn a penny’s worth of bandwith is like saying "I’ll hold you up at gunpoint but I’ll only ask for a buck and I won’t shoot you I’ll just hit you over the head with my gun so I can make a clean getaway." It doesn’t send the right message about the SEOMoz brand.</div><div></div><div></div><div>Either change the value proposition or give people a real way to opt out of Linkscape.</div></div>

Avatar
from g1smd 2252 Days ago #
Votes: 4

This info seems to be posted widely, except in this thread...<div></div><div></div><div></div><div>OrgName: seomoz.org ===== =====OrgID: SEOMOAddress: dotnetdotcom.org ===== =====Address: 93 S. Jackson Street 10070City: SeattleStateProv: WAPostalCode: 98104-2818Country: US RegDate: 2008-07-07 Updated: 2008-07-07 AdminHandle: NGE11-ARIN AdminName: Gerner, Nick ===== ===== AdminPhone: +1-206-299-9628 AdminEmail: admin @ dot net dot com .org TechHandle: NGE11-ARIN TechName: Gerner, Nick ===== =====TechPhone: +1-206-299-9628 TechEmail: admin @ dot net dot com .org</div><div></div><div></div><div></div><div></div><div></div><div>From: http://ws.arin.net/whois/?queryinput=seomoz.org [2008-10-21]</div><div></div><div></div><div>93 S Jackson Street is a PO Box service called Earth Class Mail.</div><div></div>

Avatar
from antezeta 2252 Days ago #
Votes: 1

While I think the overall linkscape project interesting, robots.txt support should be a big priority. meta tags add page bloat AND are only considered after a page has been downloaded. Robots.txt solves the problem at the source. I would also suggest adding x-robots-tag support as an alternative to a meta tag, but this would be a low priority.As for the data sources, we can cross one off the list.  External access to the ASK API was disabled 6 March 2007.  At the time a contact in Ask’s Pisa research center told me they still had access, this may not be true today.  Considering API limits and the overhead inherent in mass scraping, I think Donna has hit the nail on the head.Rand raises a very interesting point on using tools where the underlying data source is unknown. You run the risk of making dubious decisions based on dodgy data. I see many people enamored by the glitz of commercial keyword research tools (such as WordTracker) which use source data from Infospace meta engines such as dogpile. Yet ask yourself, when was the last time your target audience used dogpile? If you read Italian or want to test google’s tranlate tool see this for more. The same is true for general web competitive analysis. Many cite Hitwise or comScore without realizing the data may be extremely misleading. Hitwise for the type of ISPs contributing (business vs consumer?), comScore for the sample selection method (spyware?).  Regarding links, there there is only one dataset that counts in most markets – that discovered and processed by Google.  At least when we use public sources such Yahoo’s Site Explorer or Exalead.com’s link: syntax, we know a lot about their crawling habits (i.e. frequency, depth) allowing us to make informed decisions. I think Rand’s service would be more valuable if it were transparent. Yes others could copy him, but I think that is the case today. The competitive advantage could be in excellent execution and community goodwill rather than some sort of mysticism. I suspect most people would pay for the convenience of not having to do their own scraping etc.

Avatar
from seanmag 2252 Days ago #
Votes: 10

Rand’s comment in Response to Donna’s summation:@Donna - no comment officially, but I think your thinking is very smart.It’s this kind of coy and nonsensical response that leaves people feeling like you’re blowing a bunch of smoke.  This isn’t the CIA where people should need to decode your messages.  There’s nothing here that you’re doing that is rocket science or some kind of proprietary approach.  It’s called data gathering and data mining.If it’s your investors that keep you from communicating clearly and effectively with your customer base, I would suggest you need different investors.  They’re killing your reputation.Rand, I read that response from you to Donna and feel like screaming - thanks for wasting my time by requiring me to read an endless stream of your BS, when Donna Fontenot, who is merely guessing - was able to synthesize what you’re doing in a few sentences.  Can you understrand why peole are so pissed off at you?

Avatar
from randfish 2252 Days ago #
Votes: -8

@scottbowler & @dubibob - Both excellent suggestions that others made in this thread and elsewhere earlier and we’re including them in our dev priorities. I’m hoping that by Q1 2009, we’ll have a way to register your domain and see the link data we have for it free.@halfdeck - even with any form of blocking, we’ll still be showing all the links that point to a given site or page. Blocking really just means that you won’t show up as a link source when someone you link to is queried.@antezeta - the robots used by our sources all have UAs, and they all respect robots.txt, so you can block those bots if you so choose. And yes - we will continue to provide transparency about the size of our index, actually much more so than the major engines, showing domain diversity, lots of metrics and stats and URL numbers for the dataset.

Avatar
from randfish 2252 Days ago #
Votes: -6

@seanmag - In a business environment, there’s trade secrets and competitive intelligence that needs to remain private. I recognize that ours frustrates some people, but I really feel that with tens of thousands of bots scraping and crawling the web, most for commercial and largely secret purposes, we’re actually far more transparent about what we do and give back more. Current bandwidth costs are so low that our crawling adds not even pennies to the annual cost of running most sites.I also take issue with the "endless stream of BS." I’ve been responsive, direct, and honest in this thread and elsewhere. Yes, there’s parts of this project we’re not revealing, but I don’t think that lack of disclosure, particularly when I’m so up front about it, is the same as lieing or BS’ing. I hope I can make it up to you and regain your trust and friendship.

Avatar
from corey 2252 Days ago #
Votes: 4

randfish, please deny that seomoz is cloaking the bots of the sources you list. i don’t give a shit if you own dotbot or another crawler on the list. you have yet to say that you are not cloaking well known bots. surprising, isn’t it, that such sleaze still fits as a possible truth in your statements over the past week, and i have to actually ask.also, i puked a little bit in my mouth when you said earlier that you will do what you must to protect the integrity of your dataset. will you be able to criticize a search engine for their FUD pr double talk after this crap?

Avatar
from randfish 2252 Days ago #
Votes: -7

@corey - that’s correct, we do not cloak bots or use any sources that cloak bots (although, I guess technically, the major search engines have hinted that they may have cloaked bots to help discover site cloaking and manipulation and we may pull from those sources).Regarding criticizing search engines - I think that having built a search engine (albeit a link search engine, not a content search engine), we’ve actually learned an incredible amount about web search, web crawling and the link graph which can be used to help answer (and make smart guesses about) a lot of questions we’ve had about search engine operations.

Avatar
from mvandemar 2252 Days ago #
Votes: 4

I’ve been responsive, direct, and honest in this thread and elsewhere.Why? Because you’re telling us you’re honest? You never back up the claims you’re put to task for Rand, how is it you honestly think people don’t notice that?It’s always "I wish I could tell you more" kind of responses. Whatever.

Avatar
from seanmag 2252 Days ago #
Votes: 12

@Rand - Don’t insult my intelligence.  I’ve been around long enough - in both the sales and marketing game, to know a BS’er when I see one.  Your responses have become quite predictable and I can actually see the point in this thread where you went from having some concern about doing damage control to actually relishing the attention you’re getting.I suspect you feel that you’re handling this damage control like a champ, but I have news for you.  You’ve completely screwed up the marketing of this product and you’ve lost a great deal of trust for your company and in your personal reputation in the process.Personally - despite my good feelings for your employees at large and the SEOmoz community as a whole - you’ve left me with a disgusting taste in my mouth with your approach that leaves me wanting to have nothing to do with SEOmoz.The only thing I’m finding "transparent" about you at this point, is the transparency of your arrogant and condescending tone.  It’s really pathetic. 

Avatar
from ladyemma 2252 Days ago #
Votes: 6

You keep playing this off like you’re a seatch engine...  No other search engine charges me $79 a month to use it. To me, it looks like you’re holding my info hostage for my competitors to use against me. If everything was on the up and up, why wouldn’t you offer the product as an "opt in" or "sign up" instead of taking my data without my permission? I know you wouldn’t get a complete crawl as you’d like, but at least it’d be ethical.On a side note, I was a daily visitor to seomoz, and haven’t visited since this whole debacle occured.

Avatar
from antezeta 2252 Days ago #
Votes: 3

@Rand People here seem to be asking for specific instructions to block inclusion in the linkscape program.  Pointing to a page listing the most active crawlers on the web and saying block all of them (with some minor implications beyond linkscape)... well, this is non other than a polite way of saying you cannot exclude a site from linkscape. I’d strongly encourage you to rethink this approach.Well behaved web companies have bots dedicated to specific purposes.  Last time I counted, Google had 8 different bots, Yahoo! more than 10.  I can give google and/or microsoft and/or yahoo textual web content while restricting image crawling.  It isn’t a question of opting in or out of Google, but a specific Google (or Microsoft or Yahoo!) service.Linkscape needs to decide what it wants to be: a well behaved, well understood service or a friendly rogue of the type nice to go out for a pint with, but better that the polite side of the family doesn’t know.  I’m not sure there is a middle ground here.  Do note this is not a critism of the linkscape service in of itself - I’d just like to see a more transparent implementation.

Avatar
from randfish 2252 Days ago #
Votes: -12

I’m going to take some time off from this, but will try to revisit in a few days if there are new questions or issues that need a response. I’m certainly sorry to have lost the trust and respect of people that I really do like and respect (like Sean), but I think on the fundamental issues, there aren’t going to be any significant changes, at least in the next few months, as far as the product goes.Thanks for the opportunity to present our case and for the thoughtful feedback. We will be discussing everything that arose on this thread and others internally and with our board and if we have anything new to announce, will share.@antezeta - I think I missed your comment while posting. I do recognize that we are more of this roguish, agressive sort, and I think it’s one of the reasons there’s a lot of hostility. As I said, we’re not changing direction on that now, but we’ll definitely think long and hard about it. Thanks!

Avatar
from BrettBorders 2251 Days ago #
Votes: 0

Personally, I’m really glad to have this link data. And I am hoping that the index quality will improve and get more detailed.. more transparent... not get more ommitted and occulded. That being said, I understand how some people on the more secretive side of link building might want to cover their tracks and keep things off the radar.But If you work on big company sites with lots of inbound links... and your links are whitehat (and you’re not paranoid)...  it is an AWESOME tool.It’s been useful in my work and I am glad to have it.

Avatar
from DrPete 2251 Days ago #
Votes: 3

I’ve been trying to avoid this debate as, honestly, I’m getting a bit worn out on SEO drama lately, but I’m starting to see 2 questions emerging out of this debate, and am beginning to see why Rand is finding himself in a bad position:(1) How do I get out of LinkScape?Perfectly fair question, IMO, and I completely understand how it raises alarm bells for some people. It seems like the answers haven’t been completely forthcoming, but a lot of that relates to...(2) How does LinkScape work?I think part of Rand’s evasiveness (not to put words in his mouth) comes down to this question. Frankly, SEOmoz has spent a lot of time and money to build a product, and is understandably resistant to telling us (including competitors, some of whom are feigning outrage, IMO) all of the secrets of how that product works and sacrificing competitive advantage.Unfortunately (1) and (2) can’t be completely separated, and so here we are. The reality is that we’re going to all have to accept that this is a for-profit work-in-progress, decide how we feel about that product, and then get on with our lives. Meanwhile, Rand and company will have to work to improve that product and regain lost trust. I think many of the questions are fair ones, but the overall reaction is a bit over-the-top - frankly, I’m glad my business decisions aren’t under the kind of lens that Rand’s constantly seem to be.

Avatar
from whoisgregg 2251 Days ago #
Votes: 0

 Regarding criticizing search engines - I think that having built a search engine ...<div></div><div>It appears that "having built a search engine" has put SEOmoz the same double-speaking, lack-of-transparency, randomly paranoid phase that all the major search engines seem to go through, and that all of them still have ingrained into their culture to some degree.</div><div></div><div></div><div>In other words, SEOmoz is exhibiting the same kind of behavior that keeps the "content search engines" from releasing more link data. Which is why SEOmoz had to go buy their own index.</div><div></div><div></div><div>Funny when you think about it, sickening when you watch it happen to someone you thought it wouldn’t happen to. :(</div>

Avatar
from mike 2251 Days ago #
Votes: 0

h8 to join this trainwreck, but as XKCD said "There is someone WRONG on the internet". If SEOMoz get data from a source within that sources rules, I don’t see why they need to allow you to opt-out of their conglomerated "index" - they have a right to said data, they can have it, too bad so sad. I agree this isn’t an SE: its an intelligence tool. Hitwise don’t let you opt out, either as a user of an ISP or on a site level, and that stuff is way scarier. Ditto trends and about 4 trillion other tools. That is the way "market intelligence" stuff works, in that the contract exists seperate to the website owners. Different if a crawler exists and they use the crawl data in some way - not because that takes bandwidth tho, but because they are obtaining data in a, if not unethical, certainly not 100% polite way if there is no way to block the bot. For those who care (and I don’t), you can ban all bots trivially, by making the last command: User-agent: * Disallow: / After all webmaster useful bots (e.g. put in a line that allows Google etc above it). Call me silly, but if you DON’T block useless bots, that’s your problem, because you know that we are living, in a robots opt-out world (and I am a robots out-out girl). Forget Dotbot or whatever, this is the way it should be done adn ALL robots should respect robots.txt. Personally, I’m more "three thousands words, three blog posts" mad @ Google for the whole "we can opt out of trends but you can’t" stuff than this tool, which far fewer people will use. Focus the rage where it best fits, IMHO.

Avatar Moderator
from Jill 2251 Days ago #
Votes: 7

Rand said: <div></div>I do recognize that we are more of this roguish, agressive sort, and I think it’s one of the reasons there’s a lot of hostility. As I said, we’re not changing direction on that now, but we’ll definitely think long and hard about it. <div></div><div>So why didn’t  you just say that at the beginning instead of pretending you had something different than an agressive bot that we basically can’t stop from scraping our information whether we like it or not? </div><div></div><div></div><div>At least it would have been an honest answer and this would probably be a non-issue now. Plus, you could have saved us all a lot of reading, and yourself a lot of wasted time.</div>

Avatar
from Mert 2251 Days ago #
Votes: 0

@Sean - You rule. As a long time SEOMoz attender, I also feel sick to my stomach.@Rand- Rand Said--- I do recognize that we are more of this roguish, agressive sort, and I think it’s one of the reasons there’s a lot of hostility.Your job as an SEOMoz employee is to difuse the situation, not go to the fire with gasoline. Negative generalizations are doing nothing but ruining this for you. I was sort of supportive of you when you pulled this same tactic at John Andrews’ blog and then with Michael Vandemar’s blog as Rand could not be a bad guy right? Well I guess heroes can be villains too (yes that was an analogy from the NBC show Heroes). I donot know how much more civilized this roguish agressive sort SEO can get with you at this time. Shame on you.

Avatar
from Mongoose 2250 Days ago #
Votes: 0

Alot of peoples toes seem to be stubbed here thanks to LinkScape and all of that jazz surrounding it. The bot mystery, not being able to opt out then the whole issue with the meta tag branding, and now everyone attacking Rand for being secretive. From my perspective,  Rand isn’t disclosing everything concerning this project, that would sink SEOMOZ, who in their right mind would just throw everything you’re doing on the table, Would you ask Google to disclose all of their information and tactics? I don’t feel like this was meant to be such a fiasco, but maybe I’m wrong too. I support SEOMOZ, and used many of their tools in the past. I respect Rand for explaining and trying to manage all the negative heat coming off of this. I’ve said many times before, I don’t support that the scrapers don’t care about Robots.txt and the exclusion method is using a meta tag, but I do support the sentiment behind LinkScape.

Avatar
from streko 2248 Days ago #
Votes: 1

holy shit, i should have made the bet with madhat about this breaking 100 comments. SHIT!

Avatar
from donovanroddy 2215 Days ago #
Votes: -1

What’s the deal with everyone flaming Rand for; the service is what it is <b>deal with it</b>. Oh’ I’m so distraught at what seomoz has done, I’ve been such a loyal follower for soooo... many years. Are you F@#$#@ serious???

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event: