Published: Oct 20, 2008 - 11:22 am
Story Found By: dannysullivan 1670 Days ago
Category: Link Building
110 Comments
110 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:


Learn more about search marketing with our free online webcasts and webinars from our sister site, Digital Marketing Depot. Upcoming online events include:
Comments
SEOmoz has a new page up on blocking its spider:<div>http://www.seomoz.org/linkscape/help/sourcesThat page has a specific instruction for the meta robots tag. There is no specific instruction for a robots.txt file equivalent, unless "seomoz" is the agent name thats supposed to be used there. This should be said explicitly, if so.Complicating matters is that a number of other "data sources" are listed, including pulling information out of APIs from Google. This gives the impression that to be out of Linkscape, youd have to block Google and other search engines.Add to this a debate over whether SEOmoz is even running a crawler:http://sphinn.com/story/79700</div><div><div></div><div>And a debate on whether that debate should have been closed to new comments:http://sphinn.com/story/79980I dont want to reignite the debate on whether there is no crawler and SEOmoz has marketing issues, etc. about that. I want to specifically get answers to inclusion in the index.Some site owners dont want to be in the index. They dont want competitors to look up data about their link structure. That raises these issues with me:1) If SEOmoz is crawling pages on its own to gather some data, can you block those?2) If SEOmoz is listing pages based on its own and third party data, can you choose to opt-out from being listed?3) Should you be able to opt-out of being listed if Linkscape only uses data from third party sources?OK, for me:</div><div>1) You should always be able to opt-out of being spidered. As I commented earlier:http://sphinn.com/story/77000#c55253</div><div></div><div>"Bottom line, if youre going to crawl the web and to be a good web citizen, then you obey robots.txt blocking. If you dont, then youre not a good citizen in my books."So as SEOmoz does seem to be spidering to some degree, they should be providing both commands for meta robots and robots.txt plus explaining what happens if you choose to be excluded, which at minimum I think means they dont spider you.2) If you agree with 1, then that keeps some data they might gather from showing up. But there are third party sources that are used. I cant see that you can expect to be not listed (listed, separately from being indexed), if thats the case. There are several tools, for example, that draw on Yahoo Site Explorer. Do we say those tools shouldnt opt you out, if Yahoo Site Explorer still lists you? But to the degree Linkscape goes beyond those tools with its own spidering, then the exclusion I suppose my reassure some people.3) If the tool entirely uses third party data, I cant see how people can ask to be excluded. Heck, I have no ability to request that Hitwise exclude selling competitive data about what people are doing on my site to other people, since theyre gathering that through ISP information. I kind of wish I could, but I guess Ive come to accept lots of this data is out there in various ways.</div><div></div><div>So for me, bottom line -- if youre getting spidered, yes, you get an opt-out. No ifs, ands or buts. If SEOmoz wanted to go the extra mile, they could say opting-out would prevent being listed at all, even when using third party data.(sorry for all the bold. somethings gone wonky with our comment system putting that in there!)</div></div>
Also of note, it appears from Rands statements that even if you use the META tag, it could take 30-60 days for your site to be removed.That seems like an extremely long time to have your data out there if youve tried to opt out of being included. Shouldnt SEOmoz offer a quicker method of removing your sites from their index? Removing it a month or two down the road wont do much good if everyone has already had access to your data for an extended period of time.
30-60 days wouldnt be uncommon from some of the major search engines. I think only Google currently offers a faster turnaround if you specifically request it, and that took them nearly 8 years to put into place. From a PR standpoint, sure, removing it faster would be better.
I have a simple set of questions for SEOmoz to cut through the chatter and get to the bottom of the technical details of the Linkscape bots and index. 1. (Yes/No) Does SEOmoz control computers that contain a web robot that retrieves data for Linkscape? Computers are defined as physical hardware that SEOmoz owns or virtual services that SEOmoz has access to such as Amazon Web Services or hosting accounts? If yes: 1a. Does the web robot retrieve and obey the robots.txt? 1b. If it obeys robots.txt, which User-Agent string does it respond to? 1c. What is the HTTP USER AGENT of the robot? By HTTP USER AGENT, I mean the HTTP header that is sent with each request according to the HTTP 1.0 or 1.1 protocol specifications. 2. (Yes/No) Does SEOmoz get hold of data that robots outside its control retrieve? By outside of its control, I mean robots that are built, run, and maintained by other companies. Getting hold of data can involve using an API, buying the data on disks or retrieving it online. If yes: 2a. Which robots data does Linkscape currently use to build its index? I dont care about the potential for future use, I care about now. 2b. Which robots data has Linkscape used in the past since its inception? So straightforward questions that need clear cut simple answers. Thanks,Pierre
Danny, if Linkscape is only offering the same type of information that Google or other search engines offer then I guess thats fine but they perport to be offering much greater insight into link structure etc. If its a more powerful tool, doesnt that carry with it the responsibility to allow people to remove themselves quicker rather than allowing their competition a 1-2 month long window to examine it?Besides, this is an issue SEOmoz should have (and from Rands comments did in fact) thought through prior to release. If theres no quicker way to remove yourself, and there was no pre-launch opt out option, it seems like removing yourself now is probably not going to do much good.
Mike had asked if I went through what Rand said in a post on his site:http://smackdown.blogsblogsblogs.com/2008/10/17/how-to-block-the-bots-seomoz-isnt-telling-you-about/I had, but Ill highlight points here:+ SEOmoz will treat noindex also as nofollow -- it wont follow links on pages that youve blocked from being indexed."And Andy - we treat meta nofollow the same way the engines do - they don’t appear in our link graph or any of the calculations. I’ll ask the guys to add that to the information page."+ Blocking Linkscape appears to make it not list your page at all, even using data from third party sources."If you read our sources pages, you can see exactly how to block Linkscape from listing your sites/pages without blocking any of the engines. We obey the robots meta noindex tag, but also an seomoz noindex if you just want to target our index."Looking back at Linkscapes sources page to understand more about this, it says:"The best way to restrict data from all of Linkscape’s data sources is with the Robots tag. Linkscape obeys either “ROBOTS” or “SEOMOZ” in the meta tag’s “name” attribute."I read that as saying if you block a page from being spidered, it wont appear at all -- not even using third party data. But I think it could be more clearly said, if thats the case. And if thats the case, I wouldnt get into listing all these other data sources at all. You dont need to know about blocking the, not via Linkscape, if by blocking Linkscape, youre not going to show there at all. It just confuses matters.
hmmm linkscape or linkscrape that is the question
"The best way to restrict data from all of Linkscape’s data sources is with the Robots tag. Linkscape obeys either “ROBOTS” or “SEOMOZ” in the meta tag’s “name” attribute."Which means every page of a site would need to add special robots tag, I take it. (Kinda like when Microsoft made us add one for their stupid smart tags) I think most would rather be able to add a simple line in a websites robots.txt file like this:User-agent: SEOMOZ Disallow: /
OK - Long list of questions to answer and Ill try to address them all.@Danny1) If SEOmoz is crawling pages on its own to gather some data, can you block those?You can block the crawlers that power Linkscapes index. They are all listed on the sources page - http://www.seomoz.org/linkscape/help/sources. In order to be protective of our competitive intelligence and to dissuade folks from blocking our bots, we will pull from multiple sources. I know this is frustrating to some, but in order to build the best product possible, we need to have an index as close in approximation to the major engines as possible.2) If SEOmoz is listing pages based on its own and third party data, can you choose to opt-out from being listed?Yes you can - no matter where we get data from, you can use the meta seomoz noindex tag to say "dont show my URL in your results" and we will respect that. It does take 30-60 days to update, as we need to re-crawl and re-integrate with our index (just as the major search engines do, though theyre typically faster).3) Should you be able to opt-out of being listed if Linkscape only uses data from third party sources?Yes, even if we did only pull your data or your URL from third-parties, you can still prevent being listed by using the meta seomoz noindex tag mentioned.@Pierre1) Yes, SEOmoz controls machines that host our data, crawl and process to calculate the link metrics.1a - Yes - any robot we use or any third party source we pull from respects and obeys robots.txt1b - The UAs are listed on the individual sources websites and all sources we currently use are on our sources page - http://www.seomoz.org/linkscape/help/sources - you can block these individually or en masse.1c - Again, these are listed on the individual pages for the sources, so you can see them publicly.2) Yes - we may, now or in the future, pull from third party data sources for data that either becomes inaccessible to us in other ways or is more economical to gather from third parties. Again, all the third parties we might use are listed on the sources page.2a - Were not revealing this.2b - Also not revealing this. Theyre both competitive intel. Sorry.Any other questions, just let me know and Ill be happy to answer.@Jill - currently, were supporting just the meta robots tag for a variety of reasons, including that if we get any data from third parties that have crawled, we need to know on an individual basis whether those sources should/shouldnt be listed. Its also hard for many site owners with subdomains or sub-hosted blog/CMS accounts to add a robots.txt, so the page level makes good sense there, too. We might revisit this decision in the future, though.
Thanks, Rand. The answer to 2 pretty much solves it. Block you, and youre out Linkscape, period. If thats the case, like I said, saying to block all these other spiders is confusing.But....How can people block you specifically through robots.txt? Its a pain for some people to be tagging each individual page. Is "seomoz" the user agent they can use as Jill suggests?And...Are you sending out any spiders of your own. It has sounded like you are. And aside from listing issues, some people dont want to be spidered at all just for bandwidth reasons. Typically, this means they can request being blocked through robots.txt (a single request rather than hitting each page).
No no, Rand. Question 1 was about the robots that you control, not third party sources. And for a robot, its a piece of code that uses your bandwidth to download data from websites onto your computers. Not third party robots, but SEOmoz robots.If you have a piece of code that goes through pages that you already have (an index, a set of files, cached copies, whatever you want to call them), whether you downloaded them or got them from 3rd party sources, then that is NOT a crawler. Its a parser. Thats not a symantic difference but an important technical fact that needs to be stated clearly. A crawler is only one component of the system that retrieves (crawls) pages from the internet, stores them, analyzes them, and calculates link metrics.As for 1b, the link you provide does not show any UA for a robot that SEOmoz owns and so you do not, according to the page, own a crawler. This is in contradiction to the Yes answer you have to question 1. And I find it very ironic (if also a touch rude) that a tool sold as a competitive intel tool is being secretive for competitive intel reasons.Pierre
@DannyYes, block us with the meta robots or seomoz noindex and youre out of Linkscape. We will still show links to that URL, but we wont list that URL in our results.People can block the UAs that collect data for Linkscape, which are listed on the sources page. Were not revealing which ones are currently active, but you should expect that any of those could be used to acquire crawl data (and not just by us).Spiders of our own - there are no spiders crawling under the name "SEOmoz" or "Linkscape" and all of the spiders that do crawl for us are listed in those sources. They can all be blocked through robots.txt.@PierreRobots we control - we do control robots, so the answer remains yes.We have both crawlers that fetch data for us (again, those are listed on the sources page) and parsers and processers to aggregate the data, build the index and calculate the link metrics.1b - I think I have to say no comment on this. As I said, theres certain information were not revealing. However, I dont believe theres any contradiction.Irony - Well, pretty much every pay-to-use competitive tool on the web or off does not disclose sources. Have you ever asked Hitwise where they buy their ISP data? Or where Spyfu or KeyCompete get their data? Or where Wordtracker pulls its ISP data? None of them will answer you either. Im sorry that its rude, but its how the world works and were certainly not substantively different from any of these others.
So, in other words, youve built a way to make money off of basically other peoples content and made it exceptionally difficult to not be a part of it.. At least you dont have "dont be evil" as a motto..
@Feydakin - I think technically, yes, thats accurate. We, obviously, dont think its evil. We think its valuable data to have and is technically all publicly accessible. We dont scrape and re-purpose content or sell ads alongside the work youve created. We built a tool that we felt was important and useful and we give away some valuable data for free and hold the rest back for paying customers. Were a for-profit business with investors and employees and payrolls. I think its very challenging to make the case that what were doing is more "evil" than others selling paid services off competitive intelligence or web crawls.
"We will still show links to that URL, but we wont list that URL in our results."So Ive I enter a URL into the report box -- and that URL has been blocked -- will you report nothing for it?But if other pages are linking to that page, and you run a report on some of those, youll see theyre linking to a page even if it blocked. Is that correct? It would make sense -- links on the other pages can be seen.As for "the spiders that do crawl for us are listed in those sources." Googles not crawling for you. You can get some data from the Google API, but its probably overkill for people to think they need to block Google to stay out of Linkscape.Theres an implication that youve got one of these sources in particular pulling in specialty data for you, working under license but without saying its name. If so, it would be nice to know which one, I suppose. But since blocking you keeps you out of Linkscape period, less a worry, Id say.I think its still an issue that youre effectively telling people they have to insert a meta tag on each and every page they dont want listed in Linkscape. Since youve talked so much about having crawled and built an index of the web -- regardless how who youve leveraged to do this -- Id hope youd offer an easier way to opt-out an entire site.Im guessing that if youre not really crawling, then you dont have access to the contents of robots.txt files, so you cant see the blocking this way. And lacking that, you then need a way to verify that a particular domain really wants to be blocked -- which is yet a new system to setup. It would certainly be easier to spider the robots.txt files themselves.
Here is another yes/no question:Would using a meta tag to state noarchive, as specified at http://www.google.com/support/webmasters/bin/answer.py?answer=35306 , also remove our pages from Linkscape?Pierre
@rand, I guess that my whole impression of this is that you have just created an entirely new generation of script kiddies and using the excuse of we need to turn a profit to do it.. Weve seen this in several iterations over the last couple of decades where people with skill and knowledge work hard, for good or evil, at what they do.. Then a couple of people come along and offer up a script based on that hard work and knowledge that anyone can use for just a few bucks.. Site Explorer does this to some extent, but I think that you have created the ultimate spam generator for people who link.. I suspect that we will start seeing more spammers using this tool than marketers as they dig for those juicy links to try to grab.. I dont see a way to keep the service from being abused by spammers, but expecting people to add a meta tag to every single page that they control seems a lot excessive to me.. I can see a simple verify session like Google WMT uses as far more effective and easier to do for the people who dont want their sites in Linkscape.. But, if important sites opt out it would certainly effect the value of the service..
My interpretation: The list of bots provided is really just a list to hide something within it. i.e. The bot that SEOmoz "controls" is one of those listed, but they wont disclose which one (*cough* dotbot *cough*). The rest are there because they also use their data to create the index, but they dont control the googlebot (obviously) or the yahoo bot (obviously), although the impression is given that they do. So, the end result is a data set created by combining the seomoz/dotbot crawl data with that obtained (free or paid) from all the other sources. By using the page-level meta tags that Rand has provided, we can opt out of having our data served up to competitors. In order to prevent the bot from actually spidering us in order to save bandwidth, etc., wed either have to block all the ones in the list (yah, right, you gonna block them all? i think not), or you have to know which one is the one "controlled" by seomoz (*cough* dotbot *cough*). Thats my interpretation of the story, whether or not anyone wants to refute it. My mind is fairly made up that this is darn close to the truth, so I doubt any more fancy-pants talking would change my mind. Its up to you all to make up your own minds. Now, as for that whole dotbot number of sites spidered subterfuge, thats a nuther ball of wax clogging up my ears - http://sphinn.com/story/80051
"We will still show links to that URL, but we wont list that URL in our results."So Ive I enter a URL into the report box -- and that URL has been blocked -- will you report nothing for it?I am sorry, I have to get back to basics to clarify this as Ive feeling I am going mad :)Blocking linkscape - what exactly does this mean? Does it make linkscape (no matter what the crawler is) disregard links going from this URL (which I feel would be logical)? Or does it prevent the tool from listing backlinkf to this URL?
Rand, whatever way you cut it, youve lost out on a whole bunch of trust here. I wont rehash what anyone has said already as its all been said. Edited faqs, cached versions that support a contrary opinion and more.People in this game arent mugs, yet it would appear that from what has been pushed back and forth in the whole he said she said ding dong is that a few people have formed a view that you or SEOmoz took them for such and tried to paint a picture that a thing was something that it wasnt. Hindsight and all that, but if you ever recrunch anything else out there and offer insights based on this or that, then maye you should just say that next time. I cant help but feel that no one would have been up in arms or cared even, (excepting dotbot and few other data providers perhaps) Right now you guys look bad and will probably be tarnished by this for some time to come (at least in this community).
It looks like the bottom line here is that in order to keep the 400 or so sites (some with tens of thousands of urls) belonging to me and my company and my clients out of Linkscape, its going to cost me one hell of a lot of time, and therefore, money (as time equals money in our business). Adding that many meta robots tags is just not practical. All I want is to be left alone to do my business and my clients business, and suddenly this gets dumped in our laps. I object to this on so many levels I dont even know where to start (and most of them have been covered various places) Thanks a lot.
I was trying to stay out of this, but as Annie pointed out there seems to be a bit of ambiguity going on. And I just want to be clear.If I put the following at the top of every html file on www.vinnygoldsmith.com:META NAME="SEOMOZ" CONTENT="NOINDEX" What, if anything, will people be able to see about my site? Links in? Links out? MozRank? Nothing at all?I really thought it would remove it all together and people wouldnt be able to see anything about it, but some of the words youre using "we will still show links to that URL" are giving me pause.
Ok, trying to slice through all of Rands answers am I correct in saying that1) SEOmoz does not OWN the spiders that are gathering this data.2) the only way to block the bots that pull Linkscapes data is to block ALL of the bots listed on that page (which includes Yahoo and Google)??Also, if I use the Meta tag that keeps my info from displaying, will you still be using it for other reports (as in sites that I link to etc)?It sounds to me like you reluctantly provided a way to stop your information from displaying because the community demanded it, but youre unwilling to provide a clear and concise way of keeping Linkscape from obtaining the data and storing it for your own purposes later.Is this a fair assessment?
@JillYes that would be the more usual way of doing things but Linkscape hasnt actually got any spiders (AFAIK?) that means they dont control the crawling - they buy the data these crawling companies produce. However you can use that approach on each of the known spiders collecting data and then selling it to Linkscape. IncrediBILL has a list of them. Ill try and get something on this written up later. That of course wont be guaranteed to keep you out of Linkscape, as they could still include you from their API calls at Google/Yahoo/MSN and youre hardly likely to block those spiders. That is why Rand has suggested this META tag, as its the only way they can guarantee your site will be excluded. The scraped data they buy will contain this META and they will filter for it at their end. Depending on their arrangement with these companies to buy their data, this could take 30-60 days. They have to wait for these companies to scrape your site with the new META on there and the data to be entered into Linkscape. They cant take you out before this point. I assume though once this META "flag" has come in from one source or another it will trigger the site exclusion.Correct me if Im wrong Rand?
Which is why I suggested an opt out option right on the linkscrape site.. Its easy to verify site ownership just like Google does with WMT.. Add a simple file that SEOMoz can look at after you request exclusion.. Once site ownership is verified, bounce the data for that site that he already has and not include it in the future.. 1 small text file per website and the process cantake minutes instead of months.. But, of course, if that happened I would love to see the stats for how many people actually opt out..
Am I the only one who feels really icky about the idea of branding my web pages with an SEOMOZ meta tag? Ill never do that http://sphinn.com/story/80172@feydakin "claiming" websites with SEOMOZ gives them even more sensitive data... no way.
@johnandrewsI feel the same. It’s work to implement and unnecessary code bloat. I also dont like the way it labels my site. If were talking about "flags" which Google, Yahoo, MSN and even other marketers use to identify SEOs, well this is a doozy.
Sorry for the delay - on a conference call. Again, Ill try to answer all the items one by one:@dannyWere just like the search engines in this respect. If you say "noindex" we wont show you in our results, but well still calculate link metrics based on who you link to or how. If you do a search on a URL thats been blocked, we wont show any information like page title or content, but we will show links that point to it. If you search for a page thats been linked to by a page that has noindex (either for all robots or for seomoz) we wont list that page in our results.As far as the implication and the specific sources that are controlled by SEOmoz, were not revealing that information now.@PierreWe dont currently have handling for noarchive one way or another, but since we dont show any page content, Im not sure exactly how we might. If youve got suggestions, were definitely open to them.@FeydakinWe think this will help far more in identifying manipulative links and reporting them to the engines than it will to find mainpulative link sources. Certainly, our impression from the engineers present at SMX East was that they think this can be a good tool for IDing spam.@Donna - no comment officially, but I think your thinking is very smart.@Ann - disallowing SEOmoz means the same thing it does in Google. They treat noindex as "noindex, follow" - meaning they follow the links and use it in their link graph, but wont show it in their results. We do the same thing. If you dont want us following links, you can use nofollow, just as with the engines.@Rob - Certainly sorry to lose your trust. I think its the price for not being as prepared as we could have been on messaging and on building a tool that goes against the interests of some webmasters and SEOs. Hopefully, over time, we can regain that trust by continuing to provide valuable content and tools to the community.@Vingold - if you block every page, no one will ever see your pages in the list of link results we show, just like if you block Google, no one would ever see your pages in their list of search results.@Skitzzo - no comment on spider ownership, but with blocking, we may pull from other sources if we cant retrieve the data. And yes, were unwilling to provide a clear concise way to keep data out of Linkscape (other than the meta seomoz noindex tag), both for competitive reasons and to make the data set the best it can be.@NickWilsdon - Exactly right on the blocking. If you block some of the bots, we may pull from other sources to make our index as comprehensive as possible. The meta tag is the best way to tell us you dont want to be included. And yes, it could take 30-60 days based on how fast our crawl gets to you and gets processed.
@feydakin Great idea - a simple opt in solution for those who are concerned. Maybe SEOmoz could pay people for their time in implementing too :DIn terms of people whod opt out, the truth is probably very near to not very many at all, at least in the grander scheme of things.This little debate is a but a microcosm of a wider webmaster community, the majority of whom are in a state of ignorant bliss, neither aware of, or giving a stuff!If Google had tried to play it smart in their early days in a similar style, its debatable whether theyd be the force they are today. Google manufactured consent by consensus and meaningful enagagement, this whole process has been neither.@rand Its a cool little tool, hats off and all that, gr8 work - just remember we are smart guys and gals too. It isnt so difficult to be straight down the line, frankness and a hands up we dropped the ball would have more than sufficed. Anyways, life goes on and its short guys, remember that too ;)
Got to say John, no your not the only one - and unfortunately I would suggest many commercial organisations may agree with you. Surely on that basis a sitewide command has to be the only viable option moving forward (as Netmeg mentioned), as this seems reactive rather than preactive activity with little or no commercial benefit (from a client perspective) - and on that basis no commercial entity is going to want to pay for activity to manage.
Rand, your response to my question pretty much sums everything up in my mind.Youre not willing to say whether you own the bots the crawl the data or not. This despite several claims on your website and countless comments here on Sphinn. That leads me to believe that A) you dont own the bots but want it to seem like your project is bigger than it really is. The only reason I can see to do that is to try and match the marketing of your project which certainly at this point seems to have been filled with misdirection if not blatant lies.or B) You do own a bot (dotbot) and you dont want everyone blocking it so that you can compile their data and sell it back to them. This would make it cheaper for you to run Linkscape since you wouldnt need to buy as much data from the third part sources.Either way, I dont think it paints your company in a good light.Also, you admit that youre unwilling to provide a clear and concise way for webmasters to keep their data out of Linkscape. Can you explain to me what other reputable company does that? As mentioned, every other crawler, indes, or archive that Ive ever come across has made it quite clear and easy to "opt out" so to speak.Im sorry, but your two line comment to me speaks volumes and signifies to me that SEOmoz no longer cares about the community that they purport to serve, and now only care about the bottom line. Thats fine, youre a for profit company with investors, but you can go ahead and drop the BS about transparency and serving the community.
Something else I just realized.. If it can take 60 days to get "opted out" by adding a meta tag, doesnt that make this data sort of stale anyway?? Im not sure how much real value there is in stale data.. Maybe a lot, maybe not so much.. I just think that the whole launch and maybe even the idea was poorly thought out from a reputation POV.. The beginners and scripters will love it, but I suspect it will cause lingering problems with people more established that are tired of seeing people find ways to make money from their work..
After reading this entire page, I have to wonder if Rand realizes that he is basically alienating every non-spamming webmaster that pays any attention at all.
Ok, so Rand has basically admitted I am right about this comment that I made above:My interpretation: The list of bots provided is really just a list to hide something within it. i.e. The bot that SEOmoz "controls" is one of those listed, but they wont disclose which one (*cough* dotbot *cough*). The rest are there because they also use their data to create the index, but they dont control the googlebot (obviously) or the yahoo bot (obviously), although the impression is given that they do. So, the end result is a data set created by combining the seomoz/dotbot crawl data with that obtained (free or paid) from all the other sources. By using the page-level meta tags that Rand has provided, we can opt out of having our data served up to competitors. In order to prevent the bot from actually spidering us in order to save bandwidth, etc., wed either have to block all the ones in the list (yah, right, you gonna block them all? i think not), or you have to know which one is the one "controlled" by seomoz (*cough* dotbot *cough*). Thats my interpretation of the story, whether or not anyone wants to refute it. My mind is fairly made up that this is darn close to the truth, so I doubt any more fancy-pants talking would change my mind. Its up to you all to make up your own minds. Now, as for that whole dotbot number of sites spidered subterfuge, thats a nuther ball of wax clogging up my ears - http://sphinn.com/story/80051Since he said, "@Donna - no comment officially, but I think your thinking is very smart."So...that should pretty much answer most of the questions I would think. Well, except that ear-clogging one.
Rand, What is the value proposition to your Linkscape customers?1. They dont know exactly what theyre buying (because some of it is secret.)2. Theyre asked to trust SEOMoz (which won go over that well.)3. There is a hidden investment on the part of your customers to place Meta code and robots.tx, which still may or may not present other concerns for your customers.4. The data is skewed the more often companies run interference with Linkscape, making it less valuable to your customers.Im concerned that this gem of a device wasnt tested long enough from the usability side. There has to be ways to satisfy customer requirements for privacy and control, and you still have time to push them in.
@Rob - thanks, we think the tool is impressive, and as Nick likes to say, its the worst it will ever be right now in this beta stage, so expect it to get much better in features, functionality and coverage over time.@Peteyoung - well definitely talk about a sitewide option. In order for that to happen, though, wed need to build a site verification service like Google & Lives Webmaster Tools.@Skittzo - I think that of the two points you outlined, A vs. B, B has by far the better intuition. And I didnt say we wouldnt provide a clear concise way to opt out, I said we arent providing one EXCEPT for the meta robots/seomoz tag, which is pretty clear and concise.We do care a ton about the community. If we didnt I wouldnt be here explaining and responding. I cant help but resent the accusation after the thousands of hours Ive poured into giving back to the community. The fact that this tool needs to keep some information private and that portions of it are pay-to-access is a pretty weak argument for suggesting that I dont care about SEOs and webmasters.@Feydakin - well, Linkscape crawls a significant portion of what we feel needs crawling or is "fresh" every 30 days. We might take longer to reach stale, older stuff that doesnt update often. In the future, well have greater freshness, but for now, the data is between a few weeks and a couple months old.@WilliamC - Im not sure where that alienation would come from. Surely youre aware there are dozens, if not hundreds of bots that crawl the web, dont announce themselves, cloak as Googlebot and use your data for competitive intelligence thats shared with no one. Many of them even use it to harvest emails and spam or find blog comments and forums to spam, yet none of these organizations or individuals receive the same degree of criticism or investigation that we have. I dont think thats because were doing something more evil, but because were open about it and willing to share data with a wider community.@Donna - Just be aware that if we cant pull data through one source, well try to get it in another way, so individual bot blocking doesnt insure that youll be excluded. The meta seomoz noindex tag will.BTW - Whats the ear-clogging problem? I must have missed that.
@viassana - Ill try to respond to all your points:What is the value proposition to your Linkscape customers?Current link data is inaccurate, imprecise and incongruous. Linkscape takes big leaps forward by providing a lot more information about the links we see, calculating metrics for them and organizing them intelligently. You can see a lot about the features and uses for the tool here - http://www.seomoz.org/blog/announcing-seomozs-index-of-the-web-and-the-launch-of-our-linkscape-tool and here - http://www.searchenginejournal.com/seomoz-linkscape-new-backlink-checking-tool-reviewed/7826/1. They dont know exactly what theyre buying (because some of it is secret.)They know exactly what theyre buying - a tool that leverages an index of the WWW to create a link graph. The only secrecy is around how the crawl was obtained/acquired. Its sort of like buying a chair and suggesting that because the assembler doesnt reveal all their parts suppliers, you cant rely on the chair. Just sit in it, use it, read reviews about it and youll see whether or not the chair is good. Same goes for Linkscape - we think the data and the metrics are really valuable, and many others whove used it do as well. Over time, the data will get better, the metrics will improve and the index will incraese in size and value.2. Theyre asked to trust SEOMoz (which won go over that well.)Id dispute your first and second point. If you dont trust SEOmoz, you dont really need to. If you use the free data, it costs you nothing and if you use the paid data and dont like it, you can get a full refund anytime. Id start by being distrustful of the data, using it and judging it based on the value it brings you. For some folks, this information wont prove useful, but for others, it will be immensely valuable.3. There is a hidden investment on the part of your customers to place Meta code and robots.tx, which still may or may not present other concerns for your customers.Our customers dont need to place any code on their sites unless they want to exclude themselves from being available in the link search results. Im not sure what other concerns youre referring to.4. The data is skewed the more often companies run interference with Linkscape, making it less valuable to your customers.Right, which is why we may pull from other data sources if the ones we control and run cant access link graph information we need. The crawl is designed to be as similar as possible (at least, in the long run) with the major search engines, so well reach far and wide to build that index, which makes it more valuable for our customers.
@Rand>In order for that to happen, though, wed need to build a site verification service like Google & Lives Webmaster Tools.Thats actually not very hard, my team did that for SocialBlogroll.com. Users can claim their blog, create a custom .html file with the code in the filename and then verify in the system.
must be a pretty good tool for people to be in such a panic about it.
"In order for that to happen, though, wed need to build a site verification service like Google & Lives Webmaster Tools."Honestly, after youve indexed the web, building something like a site verification program to allow people opt out - should be a relatively easy project.I dont know how much of my websites linking relationships should be private as opposed to public, especially since it can be argued that is already available to whoever has the resources, energy and inclination to go looking for it. But I am thinking that unless I am getting a benefit from it - I probably dont want it to be readily available to my competitors.
Yeah - site verification shouldnt be terribly hard, but we have a long dev timeline already, so that work is going to be at least a few months away. Incredibill had suggested that we give webmasters our full advanced report for their own sites, and I think thats a great idea, both from a giving back perspective and from a marketing one, so along with the opt out, well probably do something like Google Webmaster Tools, where getting backlink data on your own site will eventually be free. Again, its months away, but from initial reactions here on the engineering team, seems like something thats both do-able and in everyones best interests.
@randwell, Linkscape crawls a significant portion of what we feel needs crawling or is "fresh" every 30 days.Now Im confused again.. So you "do" have a SEOMoz bot that you control and can use to crawl with.. Or, are you using the royal "we" as in we (SEOMoz) use someone elses bot (DotBot) but since we pay for it we call it ours?? Im not sure where that alienation would come from. Surely youre aware there are dozens, if not hundreds of bots that crawl the web, dont announce themselves, cloak as Googlebot and use your data for competitive intelligence thats shared with no one.But how many of those are considered leaders in the SEO community and how many are consider leaches?? They all receive far worse criticism, and when found out have their bot blocked and filtered.. Bill is a lot of fun to watch when he does this and its one of the main reasons I follow his blog.. Just be aware that if we cant pull data through one source, well try to get it in another wayNot sure that anything else needs to be said after that comment..
@rand: well definitely talk about a sitewide option. In order for that to happen, though, wed need to build a site verification service like Google & Lives Webmaster ToolsNot really, if major bots can use robots.txt for sitemap discovery I dont see reason it cant be used for building an index from different sources. Give us a unique UA, whatever it might be, make one up. Then retrieve, read and obey robots.txt. That will put the decision to opt in or not in webmasters hands without having to pin an extra tag on each and every page. It will be a bit more work on your end, but thats seems to be where the effort belongs.
Since a bot cant see a META tag without retrieving the page, META ROBOTS requirement basically means Linkscape will burn your bandwidth IF Linkscape in fact had bots that crawled billions of webpages.<div></div><div></div><div>Assuming Donna is on the mark, SEOMoz has a bot that does a partial web crawl, but even that being true, it sounds like Linkscape doesnt have bots that are capable of crawling the entire web. If it did you wouldnt need Gigablast, Amazon, Alexa, Google, Yahoo, MSN, etc.</div><div></div><div></div><div>So while what Rands been saying may not be necessariy technically untrue, I still see unreconsilable differences between the impression Rand initially gave about Linkscape and reality.</div><div></div><div></div><div>Secondly, Im glad Rands honest about his reason for trying to dissuade people from opting out of Linkscape, but its a bit like hiding a membership cancel link deep in the footer to make people jump through hoops to cancel their membership.</div>
Danny, Thank you for letting us use this outlet to finish the conversationJust be aware that if we cant pull data through one source, well try to get it in another wayRand,No SEO will ever brand their site with the seomoz noindex tag. You told us a long time ago that people talk with their money to make a strong statement; so the above statement just pretty much made sure that any SEO agency I deal with in Chicago just spent their last dollar with SEOMoz. Wow, that was cold (even colder than Chicago). I am speechless.
@feydakin - Ill continue to say "our crawlers" and "our crawl" to refer to the spiders/UAs listed on our sources page. Hopefully that will make this clearer. We dont have any bots named "seomoz" or "linkscape."Regarding the leach accusation - I guess its about perception and whether you think this data is valuable and important to be available. I think we have differing opinions on that, and so were not going to reach the same conclusions.@jimbeetle - we list all the data sources we use, and if youd like to block any or all of them, they all have UAs. Creating a UA that doesnt exist would mean that only crawl sources we fully controlled could obey it, while the meta tag means that all data collected from any source could be blocked.
@halfdeck - Im sorry if the impression given was off the mark. I think the only real area that could be said is in terms of how we get our crawl. At first, we were completely quiet about the sources, and now were revealing them, though not providing very specific details.@mert - Im really sorry to hear that. I personally feel the opposite way - that this data and information should be accessible and that a tool like Linkscape needs to exist. If youd like to opt out of inclusion in Linkscape, weve provided a way with a meta tag and may offer more robust ways with site verification and blocking in the future.My statement that well grab data we need to be representational of the major search engines indices is accurate. We might use any of the sources listed on that page to build our index and link graph, and Ive been very upfront about that since we revealed those sources. Its in the interests of our customers and anyone using our data that it be as reliable and complete as possible. Im not sure why this would make you want to stop using the data or buying it from us, but its certainly your decision (though it does sadden me). If theres something we can do to earn back your business without violating the value or integrity of our dataset, please let me know.
@Rand, This is not personally against you. This is a message to the SEOMoz company, which includes the VCs that you are dealing with. A companys message to their main client base (which in this case are SEOs) is important. You might assume there is a monopoly of data here. No there is not. This is not about leeching or any other issue you have. It is simply the matter of do I listen to my customer base. No I do not. If the client base is not listened to; then there is one simple action the client has to do. That is to terminate business relationship and move on to greener pastures. You are a friend but there is no other way to relay a message any other way given the intense communication that has eaten enough Sphinn bandwidth. Once SEOMoz fixes its stance, then there is a second chance for a trust. Thank you for your friendship.
@rand - Creating a UA that doesnt exist would mean that only crawl sources we fully controlled could obey it, while the meta tag means that all data collected from any source could be blocked.Guess I didnt express myself very well. I didnt mean to imply that the robots.txt entry would deny crawling to any bots; I basically meant for you to use it on your end the same as you would the meta tag. As you compile your index if the UA is included as blocked in a sites robots.txt, then treat it the same way you would have if it were the meta. Basically, use the robots.txt entry as a substitute for the seomoz noindex. This would make it much easier for folks who want to opt out.
From a technical POV:NOINDEX (provided by a HTTP header or META tag) is not suitable to keep me out, because LinkScape needs to recrawl the page to see this directive, and that can be in 30 or 60 days or never.As long as theres no crawler directive obeyed by Google, Yahoo, and all other service providers that do the actual crawling, and no timely refetch of each and every page/domain requested by any user, theres no working way to opt out.Correct?
@rand, I never said "you" (SEOMoz et al) were leaches.. You are the one that lumped yourself in with those people by comparing what you are doing to what they do.. What I did say was that you have a certain reputation in the industry and they dont.. So yes, you are expected to do better than a leach and a scraper..
Rand, And I didnt say we wouldnt provide a clear concise way to opt out, I said we arent providing one EXCEPT for the meta robots/seomoz tag, which is pretty clear and concise.The problems with the meta tag have been outlined fairly well in this thread and others. Also, it doesnt keep you from archiving my sites information and using it in the future or even selling it to someone else to use.I realize that there are other bots out there that do this and disquise themselves etc as you mentioned but if youre only trying to be better than the scrapers, then thats setting the bar pretty low dont you think?We do care a ton about the community. If we didnt I wouldnt be here explaining and responding.Rand, the people here at Sphinn are your target audience. This is PR and damage control, this isnt about giving back to the community.The fact that this tool needs to keep some information private and that portions of it are pay-to-access is a pretty weak argument for suggesting that I dont care about SEOs and webmasters.I actually never used the free vs. paid issue to suggest that you dont care about the community. I used your own statement (and I quote) "were unwilling to provide a clear concise way to keep data out of Linkscape." And, just so you dont try to use the meta tag defense again, let me emphasize the meta tag will NOT keep my data out of Linkscape, it will keep it from DISPLAYING in Linkscape.Once again, please share with me what other reputable websites or businesses milk data from sites to sell and dont offer a way to prevent that data GATHERING (not displaying). Im all ears.
@Mert - Many thanks to you, too. I think when friends can disagree and have a discussion on the merits, it makes better products, better companies and better people.I think SEOmoz is listenting to our customer base - were trying to provide the most comprehensive, valuable product possible and to do that, we pull data in different ways from different sources. We provide a solid way to opt out - through the meta tag - and we give disclosure on the sources we may use now or in the future so you can block those UAs if youd like. If you feel that the tool doesnt provide value to you, or that in our attempts to build something valuable, weve crossed a moral or ethical boundary, thats certainly your perogative and decision. I think that since we obey robots.txt and only pull from sources that also do, AND provide a specific method to opt out of being in the results, were covering our bases and protecting the interests of our customers and the wider web. However, reasonable people can certainly disagree and thats why Im in this thread - to hopefully help answer any questions that arise and provide our perspective on them.
@Sebastian - I think thats incorrect. The meta tag will opt you out, and like the major search engines behavior, it will require time until we see that tag and can remove it from our next index update. @Feydakin - I expect better from us as well. I think that we are living up to them, and I know other disagree. I think thats going to be an opinion issue, and thus one that I cant further expound upon or explain.@Skitzzo - There may be some problems with the meta tag, but it is a way to clearly, concisely opt out of Linkscapes results. You are correct, however, in saying that it doesnt mean we wont use link information gathered on noindex pages in calculations like mozRank, mozTrust, etc. just as Google/Yahoo!/MSN/Ask do. If you wanted a link removed from the link graph and the calculations, youd need to use nofollow.
Rand, do you honestly believe the meta tag allows you to opt out? Let me illustrate my point a bit more clearly.If I put the meta tag on my page does it keep you from crawling my site? Does it keep you from using my bandwidth? Does it keep you from storing my data? Does it keep you from later selling that data to some other company that doesnt obey the SEOmoz meta tag?The answer to all of those questions is no. Thus, its not a way to opt out. Im not sure how I can make it any clearer to you.
@skittzo - Right - and the same is true of the behaviors of all of the search engines. You can opt out of the listings if youd like using noindex, but they will still crawl, still use for calculations and still potentially sell/profit from that data.If you want to completely opt out, you can block the bots - and all of these are listed on the sources page - http://www.seomoz.org/linkscape/help/sources. Just as with the engines, who list different UAs they use, we list all the ones we use or might draw from. Block these, and youre completely out of our index. Well continue to list any UAs and sources we find on that page so it remains comprehensive.
@randfish Right - and the same is true of the behaviors of all of the search engines. You can opt out of the listings if youd like using noindex, but they will still crawl, still use for calculations and still potentially sell/profit from that data.The difference is that unlike search engines, you wont honor my request to be removed. As you said above:Just be aware that if we cant pull data through one source, well try to get it in another wayOnce I tell one of the SEs to go pound sand they dont try to back door me. The realize I dont want them on my site and they dont come back (other than to hit the robots.txt file again).
Rand: Actually, your system itself could obey robots.txt easily enough. Simply create a spider that fetches robots.txt for each unique domain your sources find, and check it for a UA of seomoz and disallow. Simple sitewide opt-out solution.
Re noarchive:SEOmoz needs to see the HTML of the page, at the very least to see the SEOmoz-specific meta tag, right? Thats one.Two: SEOmoz keeps talking about their crawlers, which are really bots that SEOmoz does not control. You keep talking about about things like dotbot and Y! Slurp and GoogleBot.So if we block dotbot and the rest, we block SEOmozs access to our HTML. But I dont want to block GBot just because of some pesky tool, but I can tell GBot and Y! Slurp not to cache the page with noarchive. Ill happily block dotbot and other unimportant bots.End result: you dont have access to our HTML unless you crawl it yourself. And if you do, well find it and block that too.Of course, I can know if Im right or wrong because you are not giving us straight answers. Dont be surprised if we react in kind.Pierre
Rand, I agree that technically the meta tag is a way to opt out within a few months. However, I do think thats not acceptable for most site owners who want to prevent their link data from being disclosed by your tool. Im aware that your current architecture doesnt allow a timely opt out, but you could provide an opt-out option via a web form that verifies ownership and blacks out link data on request asap, doable in near real time. What do you think?
@Skittzo - You make a fair point. In order for us to build an index that accurately represents the major engines, we may pull from multiple sources, and that differs from the engines, who generally rely on a single UA.@WilliamC - Thats possible, too. Ill talk to our guys to see if its something we can implement and if so, when.@Pierre - Im not quite clear on how youre suggesting we treat noarchive. As far as bot blocking, as I said, you can certainly block all the bots on the list and well respect that. Were tenacious about including data, but only to a point. A webmaster dedicated to keeping us out certainly could through the method you described.Also - on providing straight answers. Looking over the responses, I think Ive been exceptionally clear and direct. While some of those answers are "no comment" or "I cant talk about that" the rest are as "straight" as I know how to be. If theres anything where you feel you havent gotten a straight answer, please ask again and Ill respond as best I can.
@Sebastian - I think that in the future, thats definitely achievable to enable quicker removal through verification of site ownership. I dont know when exactly we could have it baked into the product, but probably not before the end of the year - the devs have a strict timeline with a lot of projects right now. Thanks for the suggestion - we wont forget about it (putting it in our list of future upgrades today).
Skitzzo said this on the other thread:If something like this happened once or twice, Id give them the benefit of the doubt but its become a pattern of behavior with SEOmoz.1) Controversy2) Benefit from attention3) Apologize but due to time cant fix it just yet.4) Continue to gain attention.5) Finally fix it and explain how aw shucks guys, I didnt mean it like THAT. 6) Ask people to talk to you about it in person rather than writing about it in public.Rand, when you say:"site verification shouldnt be terribly hard, but we have a long dev timeline already, so that work is going to be at least a few months away."Does that mean were at #3?99.99% of the time all of the SEO drama means nothing to me, but for some of my sites a lot of work went into the link building, testing and verifying which anchor text works best, etc. For someone to be able to just come along and copy it - I dont know, I guess thats your right to sell it to them, but it doesnt sit well with me. It certainly brings up a lot of ethical questions.I was really hoping that opting out would be a lot more cut and dry than putting a meta tag with your company name on every single html file on my websites.The best metaphor I can come up with is this:I invite myself to your family reunion. Everyone you are related to is there, its a big gathering so I go unnoticed. I then casually go around to all of your relatives and ask them about their medical history as I piece together a family tree. I then also go to your doctors and bribe them for information about your familys medical history as well, and I dig up some publicly available info like death certificates to round out my files.Now, once I got all of that I come to you and offer you a complete breakdown of all of your potential medical issues that are based on heredity, genetics, etc. Believe it or not, even though I got it without your permission, you might be ok with it and maybe even willing to pay for it. It does have some value after all.But would you want me selling that to someone else? And you might be upset that I collected it without you knowing.I then come to you and say that Im not unlike the hospital - they have all of this information but it is hard to put it all together the way they have it. (Google is the hospital in this metaphor).You could argue the hospital is giving you something in return (good health, proper care, etc.) and that you dont want this information available to just anyone for sale.How would you feel if I said your only option for me not to sell this information is if you put a sign that says "vingold" up in the window of your home?
But... but... but cant we Aaaaaalll just get alonnnng?
Rand, you hit on the head:Were tenacious about including data, but only to a point. A webmaster dedicated to keeping us out certainly could through the method you described.I dont want you on my site and I dont want to have to fight you off. Linkscape is not being polite because its going after my data by hook and by crook and I have to spend precious time to fend it off. Its no different than an MFA scraper: instead of making money from ads, youre making money from subscriptions. Theyre in the same bucket on that front.Your statement clearly shows to me that you do not want to be polite. All the bots listed on your sources pages are polite but they serve me well: Google and Yahoo send me traffic. What does Linkscape do for me? Nothing good. On the contrary, it wants to hurt me by giving my competitors a leg up.This the heart of the problem: I dont trust your tool right now. I dont like the fact youre not honest about how to effectively and efficiently block it. We dont have time to fight the bad bots and you. Its all about politeness and Linkscape is enroaching on my territory, namely my sites. And it has no business being there and Id like to go away. Im happy to add something to robots.txt and thats it. Let me state this again clearly: Linkscape is not welcome on my sites and the onus is on SEOmoz to make it easy for me to keep it away.
@vingold - I think the metaphor loses significant relevance because it relies on pulling data thats not available for public consumption. Were not going through trash bins or knocking on doors or bribing anyone. Were culling data that anyone can see on the web and making it accessible in the format many webmasters and SEOs would like to consume it. As for the pattern - I addressed that in an earlier thread, but I think its going to continue to be a pattern. Were not always great at predicting what will arise ire in the community, and since we have very busy lives and jobs, our responses arent always as timely as we like, nor can we make rectifications on the fly - time is always going to be in short supply at startups. Although I wish it were different, Id be dishonest to suggest that we can break out of the cycle you mention. Well just have to live with it and hope that over time, things improve.@SamFreedom - I thought we were getting along :-) Im not in here to attack, belittle or diminish anyone or their opinions. My only goal is to make sure all the questions get answered honestly and thoroughly (even though that means saying, at times, "I cant disclose that.")@Pierre - Yes, I recognize the position youre taking and I know that for some folks, its upsetting and unsettling. However, were not breaking any laws, nor are we doing anything more than what search engines do. Your point is that they give back with traffic. I think that were giving back with data and metrics, and well try to give more back over time. And, as I said, if you are extremely upset about it, you can use the meta tag to keep your pages out of our results and block bots to keep us from accessing your data.What do we think? Are almost all of the concerns and questions answered at this point? I have a lot of other responsibilities to attend to at work, and this takes up a significant portion of my time...
Rand, thanks for considering my suggestion. What I cant understand is your timeline. Putting such a heavily-asked-for functionality at the end of the development queue seems pretty unreasonable with regard to the buzz created on the opt-out thingy here and elewhere. Of course there might be better solutions and surely Im not the guy to to tell you how to do your business ... however, Im kinda puzzled.
@Sebastian - well, we build out dev cycles internally about 3 months, and the upgrades and work thats being done is critical to business operations and the promises weve already made to partners and customers. Trust me - Id love to get things done faster; I have a million and a half requests, but we have to prioritize, and with a startup budget and a small team of devs, we cant move quite as quickly as would be ideal.BTW - Not just considering - were taking it really seriously. I think its a very good way to get more people comfortable with what were doing and possibly expanding our marketing by showing off all the data we know about your site (and what you could potentially get on the competition).
@rand Im glad to hear that you are at least looking at our suggestion for an easy way out.. I question your timlines as well.. Seems that you dont have any room for anything that goes wrong in there and we all know that that is a bad way to schedule.. Most production schedules Ive been involved with had a set percentage of open time to cover just these issues..But, you said you feel that you are giving back by providing that data.. The big caveat there is that you want us to pay you for it and Google sends us the traffic for free.. And the suggestion to block Google to block SEOMoz is just ridiculous.. To keep you out we have to keep out everyone.. If anyone else did this the uproar would be huge.. More huge than what you are seeing here in public at least..
(Ooh look a fence... my favourite place to sit ;-) ) Seems to me that everyone accepts that Linkscape is offering an incredibly useful way of carrying out competivie analysis without having to spend ages trawling through lots of data sources and building your own tools to compile and present such data (A good thing). Where things get problematic is that this means that such information is also now more readily available to your competitors (Not such a good thing). Strikes me as a case of SEOs wanting to have their cake but not let the competition eat it. How many of the people here (and elsewhere) who are so vocal about not wanting sites under their control to appear in Linkscapes dataset will still be happy to mine it for information about their competitors sites? One thing that I havent seen discussed in great detail (and Ill freely admit that Ive not looked that closely, so forgive me if it has been dealt with elsewhere and please drop a link to point me in the right direction) but how much value does Linkscapes link graph actually provide beyond the world of the Linkscape index itself? As Rands old friend and intellectual sparring partner, Michael Martinez, is so fond of reminding people, theres no point using Yahoo to try to gauge the value of links in Googles index, so why should Linkscape be any different? Sure it provides a good approximation of the link structure (of a portion) of the web, but mozRank and the other metrics dont and cant equate to actual PageRank. I have no doubt that as Linkscape grows (and assuming there isnt a mass revolt from webmasters the world over blocking it on every page of their sites) it will become a closer approximation, but thats all it will ever be. An approximation. A best guess based on a series of best guesses about value factors for the engines. Unless Matt Cutts has secretly slipped Rand a copy of the algo and thats really whats powering Linkscape, all the tool is really doing is providing people with information that is freely available elsewhere but reducing the amount of legwork required to gather it.
@Feydakin - Thats a fair point, and I think that as we get bigger and have more leeway and devs on the team, we can and should build with those in mind. Thats not to say we dont already. One quick fix we made just this week is to not show links from noindex pages in the tool (which we were initially). We simply hadnt mimiced the engines behaviors accurately, were called out, and fixed it up. We had built in some time for fixes, just not enough to take on projects of this scale (and it does require more significant work than you might think - needs to be tested, has to be applied to subdomains for those hosted on third-party platforms, etc.)@Ken - We have done some testing on mozRank vs. PageRank. On statistically significant samples, were about 0.7 off toolbar PR. Looking through the data, those pages where the difference was most significant, e.g. PR2 mR6.75, we almost always found paid links or penalties. Obviously, theres always going to be big differences between what we show and what the engines do, but I think theres still tremendous value in comparing, contrasting and working towards a closer approximation over time. Weve got plenty of time and people devoted to that as well.And no, Matt Cutts didnt slip me anything :)
@Rand - And no, Matt Cutts didnt slip me anything :)Yeah like youd admit it even if he had ;-)
@KenJonesHonestly, I think at a site/domain level the information available is ... ok. For the sites I looked at (my own, clients and competitors) I dont think it is anything earth shaking. I think it would be neat to play around with, and to spy on a few competitors who continually outrank me, but I dont think Im willing to pay for that just now.I think the biggest strategic value of Linkscape is the aggregate information it makes available to SEOmoz. It can provide a pretty good approximation of some pretty nifty statistics. Some of which Rand has already revealed (% of internal to external links, % of no-follows, etc). I think the deeper you slice and dice that data the better understanding you have of what a typical link profile might look like for an average website and where the extremes of that profile might be. That knowledge can be handy for anyone building links either ethically or not, because it will let you know approximately where the search engines radar might be.In my opinion, that information alone gives SEOmoz a competitive advantage at the top of this industry. It will be interesting to watch if other SEO companies follow suit and whether well have a whole host of bots to block using Meta tags. @Rand Ive said my peace. Thanks for listening.
@Vingold - Good point, although I think the "Is it worth the asking price?" debate is one for another time. Personally, Im not currently a paid up ProMozzer and Linkscape isnt enough to sway me to part with the cash to upgrade (although if Rand feels like chucking me a freebie I wouldnt say no :-) )You also raise an interesting question about what additional information they may be gathering which isnt being shared with either free or paid users, but again it wont be anything that a decent SEO cant find with a bit of hard graft and a lot of time analysing data.As for other SEO companies gathering similar data. Id be very surprised if there arent already a number of them doing it. SEOmoz just happens to be the only one thats making it publicly available for those without the required resources and budgets to build their own such tools.
@randfish - you are now resorting to the fact that youre not doing anything illegal. You have to see how far that is from being a good "netizen" dont you?You keep comparing yourself to the search engines but there are two very important differences.1) Search engines provide something back in return. Linkscape does not.2) As I said before, if I tell SEs to beat it, I dont have to keep fighting them off.As it stands now, you are using my bandwidth and my data for your monetary gain while not only providing me no value, but also probably helping my competition gain a better understanding of my tactics.And, if I decide I dont want to be included in your data set, I have to block all sorts of different bots, and trust that youll maintain your list of other bots you might use. I have to actively monitor your site and adapt to any changes you make. And, if you do happen to oh, I dont know, use a bot before alerting me, Ill be stuck in your index for 30-60 days.Id challenge you to look at the situation from the other side of things (difficult though it may be) and realize that despite what you tried to make people believe, youre NOT a search engine and you ARENT behaving like one. Until you make it quick and easy to exclude my data from being GATHERED Im going to regard SEOmoz as the same type of nuisance I do every other spammer and scraper trying to make a buck of my hard work.Congrats Rand, youve turned SEOmoz into the equivalent of a MFA site.
@sebastian As Rand noted himself above, a "feature" that allowed webmasters to sign in and claim ownership of a website is not better because it gives SEOMOZ more private information which Rand plans to monetize:"I think its a very good way to get more people comfortable with what were doing and possibly expanding our marketing by showing off all the data we know about your site (and what you could potentially get on the competition)."Above all else SEOMOZ would like everyone to continue to compare them to Google and Yahoo. Those major engines have treaded very lightly in the area of "registering websites" with good reason. The sites that include you and then require you to "claim them" in order to opt out, are traditionally leverage /bullying scams. Even Google wants to act prejudiciously against them (e.g. directories) when it can. Think of what Mechant Circle does, what Rip Off report does, etc. and think of Googles comments about "free" directories needing to provide FREE means of correcting information, opting out, etc. if they want to be "trusted" by Google.I am amazed no one has seriously dug into the probably terms of service violations associated with this tool. CraigsList promises to charge a fee for every automated access to its site. Last I looked Google and Yahoo APIs restricted commercial use of their data. I am sure Google is amuzed by the flack and doesnt see any need to comment right now.
@Skittzo - Im going to reject a few of those allegations - I simply dont think theyre accurate.When I said we werent doing anything illegal, it was in the context of defending against the analogy made by Vin that we were bribing doctors for medical information (which is illegal). I think that some people would say were aggressive but decent web citizens, while others could argue, as you have, that were not and that we should be less aggressive in our pursuit of data.When I look at us from the other side, I see an organization that, like many others, pulls down data about the web. They do so aggressively, and its a pain, if Im a big privacy advocate, to block their activities. Yet, at least its easy for me to remove myself from their listings. So, unless I have several hundreds of millions of pages for them to crawl, the bandwidth costs, since theyre only grabbing data once per month, are very low - probably in the pennies or dimes per month category (and in the vast majority of cases, much lower than that). This organization gives back freely an alternate metric to PageRank from Google and tells me how many links they know about to my page and site and how many unique domains are represented in that number. That might be valuable to me, but it might not. All in all, given the tremendous number of worse things they could be doing, Im probably not particularly upset, but it might piss me off a bit. At least I know them, as opposed to all those other companies and scrapers who do this that I dont, but that probably doesnt entirely excuse their behavior.Thats my "outside trying to look in" perspective.As far as the MFA accusation - thats entirely inaccurate. We serve no ones content - just URLs, titles and links and we do no advertising on top of that data.
Finally caught back up on this thread. In terms of all the concern about not being listed, thats pretty much going to happen if you block. As Rand explained above, folks are still going to get a lot of data about your pages -- at least the people pointing at them. Lets get really, really clear about this -- especially given the amount of discussion on people wanting to be removed. Block your pages, and Linkscape wont be able to report on what youre linking out to. But the people linking to you (which is more helpful in my view)? Thats totally still going to be shown. Unfair! Well, Google and Yahoo allow backlink lookups to any URL, even if those URLs are blocked from actual indexing. No one is screaming at Yahoo for giving away "competitive information." Most people point at Yahoo Site Explorer as a super cool tool and bitch at Google for not providing better reports. So dont want to be listed? Youre going to get listed in large degree regardless, just like youll have backlink data listed with Google and Yahoo even if you deny them the ability to spider you. That seems fair enough. Whats not that cool is the other issue, that theres a spidering component going on eating up some bandwidth. Not a lot, I know -- but for over 10 years, well behaved spiders have allowed site owners to exclude themselves. When SEOmoz said it had its own spider, fair enough that some figured we should have seen the blocking instructions when the search engine launched, if not before then, through a user agent left in server logs. Now apparently SEOmoz has no spider. If not, then that issue is solved as well. But then Donna suggests, weve got this long list potentially to "hide" dotnetdotcom.org as the SEOmoz spider. Id say if you believe that and are super worried, block it and anything else that looks weird on the list of services SEOmoz gives out.For me, its not worth the effort. And I suspect a lot of people concerned might have felt the same way if there wasnt all this "we cant say" or unclarity of whats listed or not.
One more post and Im done.It seems that what weve established is that SEOmoz will neither confirm nor deny their ownership of the dotbot crawler, despite their earlier promotional claims of having their own spiders that crawl the web.SEOmoz will aggressively seek out your data no matter your wishes and in fact if you block them one way, theyll try to get it another way.SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.SEOmoz is "unwilling to provide a clear concise way to keep data out of Linkscape."SEOmoz did not think ahead enough to predict that SEOs would want an identified user agent or method of removing our sites from their index, despite being privacy advocates in the past and spending many years in the SEO community.Those appear to be the facts that weve established here, if Ive missed any please mention them. Suffice to say thats not the behavior of a company I want to do business with but that is a decision everyone has to make on their own.
One fact youve missed, Skitzzo (the one I earlier called earwax):6. Whoever owns that darned dotbot is lying about the number of sites it has indexed, as has been established by the uncovering of the 7 billion pages that got instantly added to the javascript. Oh wait, of course I should probably say "probably lying", because theres a miniscule chance in hades that those 7 billion pages really did materialize overnight. Forgive me for using the word "lie". Sleight of hand might be better. ;)
@skitzzo - Ill try to address all of these points below:1) It seems that what weve established is that SEOmoz will neither confirm nor deny their ownership of the dotbot crawler, despite their earlier promotional claims of having their own spiders that crawl the web.This is correct. We dont talk about the sources for our crawl data beyond providing the comprehensive list at http://www.seomoz.org/linkscape/help/sources. Our claims of having spiders remain accurate and fully truthful - no boasting there. Theres literally no other way to get the data.2) SEOmoz will aggressively seek out your data no matter your wishes and in fact if you block them one way, theyll try to get it another way.We do have a variety of sources we can pull data from to build our web index, and should we be missing important pieces of the link graph puzzle, well use all the tools available to construct that data accurately.3)SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.Yes, although we are looking at ways to block an entire site from being shown in the future through a registration system. And yes, we cant block anything until weve re-crawled and re-indexed that page, which can take 30-60 days depending on the speed with which we crawl/re-crawl a given URL.4)SEOmoz is "unwilling to provide a clear concise way to keep data out of Linkscape."Thats what you said, and I merely copied it to point out that it had an exception. I know its a fun soundbyte, but without the important caveat in the sentence it was in, its really unfair to keep using this phrase. That caveat is that we are willing to provide one clear, concise way to keep data out of Linkscape - the seomoz noindex meta tag.5) SEOmoz did not think ahead enough to predict that SEOs would want an identified user agent or method of removing our sites from their index, despite being privacy advocates in the past and spending many years in the SEO community.We didnt think ahead as carefully as we could or should have, but weve now had a way to block us from showing your data since 9 days after launch. Granted, I wish that could have been even faster or even up when we announced, but Im not sure it warrants the level of criticism youre assigning. I suppose thats up to folks who are judging to decide.
@DannySullivan : In the comparison between Linkscape and Site Explorer, there is one important difference, or rather, $79 of them every month.. I still think that the people that will find this service the most useful are going to be the spammers looking for juicy link targets.. The beginners wont really understand it and the established people dont really need it except as a curiosity more than anything.. I bet on the day of launch you werent expecting this reaction, were you rand?? :)
@SEOmoz - I think you should share the damned bot and allow robots txt to block it. I hate to say this - and ill probably get a bunch of negative sphinns - but I would be willing to bet that less than 5% of the community is going to actively block SEOmoz bots, scrapers, etc. from their sites. Less than 1% will probably even blog about it and the reach of sharing that technical data will actually be more limited than you think. The rest of the web wont even know what happened. There are millions of websites and millions of webmasters - your linkscrape tool will still be relatively accurate given all other tools currently available.You will maintain your marketable edge. I would argue that nothing would really be lost by telling the community about this data. Just give it up already! Even cuil was cool enough to give us their bot info.
Our claims of having spiders remain accurate and fully truthful - no boasting there. Theres literally no other way to get the data.Of course there is, that is a silly statement. You list 2 commercially available indexes in your sources. It does not require the capacity to spider 30 billion pages to buy an index that someone else has already spidered and offers for sale.What you are claiming is that you did not get any (as in none, zero, zip, zilch) of your data from Yahoo, Google, or any other pre-existing database out there. Thats the boast you made.
RandFish says "identifying manipulative links and reporting them to the engines."I thought this job of "identifying manipulative links and reporting them to the engines" supposed to belong to the search engines as it is search engines job to determine which links are manipulative and which links are not in their search algorithm. By using this SEOMOZ Linkscrape tool, I am afraid those SEO with this info will report their competitors to the search engines. That is why I dont like this tool. As as an SEO, we are supposed to build links to optimize our websites and yet some where out there there would be other busybody sourgrape SEOs who will use this tool to report us just because we have more links than them and ranked higher than them.
Why not have a solution similar to Googles confirmation for Webmaster Tools?If I dont want my website indexed, I go to Linkscape, put in my domain name and Linkscape generates a unique file name. I create that file on my domain name, click confirm and now Linkscape stops indexing my site.
<font color="#186318">Skitzzo</font> says, "Those appear to be the facts that weve established here, if Ive missed any please mention them. Suffice to say thats not the behavior of a company I want to do business with but that is a decision everyone has to make on their own."Does that mean <font color="#186318">Skitzzo</font> will no longer subscribe to SEOMOZ?
As mentioned somewhere earlier in the thread, could we have access to the data on our own sites for free? Something like that would be an incentive NOT to block SEOMoz. Otherwise Im certainly not paying $79 to see what its all about and dont want my competitors to get a clear view of everything about the sites I look after.
Even if we tag all our pages with SEOMOZ META, Linkscape will still show our backlinks to competitors unless websites linking in also block their sites from Linkscape. So claiming that SEOMOZ META is a viable solution is misleading. Linkscape is not a search engine. Its a backlink analysis tool. No one cares that our web pages dont show in Linkscape results. People want to block their backlinks from showing and that isnt really possible by just using the SEOMOZ META. Blocking Google, Yahoo, and MSN in robots.txt isnt an option either. So Rand is claiming a non-solution as not only a solution but the only solution theyre going to offer for the time being. And his defense to that decision is he cant make money otherwise. <div></div><div></div><div>Bottom line is SEOMoz must offer a way for people to prevent Linkscape from crawling their sites and displaying backlinks to competitors.<div></div><div></div><div>That said, I dont buy the "competitive advantage" objection. SEOs have complained often about Site Explorer and Google link: command showing only partial data. Linkscape is just a souped up Site Explorer.</div><div></div><div></div><div>What is a problem is unlike Site Explorer, SEOMoz makes a lop-sided value proposition. SEOMoz is leeching off our websites and not only gives back nothing in return but is also prepared to charge $1200/year for us to get something back in return. Is that a fair deal? I dont care that Linkscape doesnt show ads. Google doesnt make me pay money to run queries. With Google/Site Explorer is a fair trade off. Linkscape is a one-way street.</div><div></div><div></div><div>Rand may protest that Linkscape offers valuable link data. For people who are interested in backlink mining 24/7 I suppose thats true. But for the majority of webmasters on the web, you cant make that argument. Its like taxing 95% of Americans and redistributing that money to a select few.</div><div></div><div></div><div>Rands claim that Linkscape only will burn a pennys worth of bandwith is like saying "Ill hold you up at gunpoint but Ill only ask for a buck and I wont shoot you Ill just hit you over the head with my gun so I can make a clean getaway." It doesnt send the right message about the SEOMoz brand.</div><div></div><div></div><div>Either change the value proposition or give people a real way to opt out of Linkscape.</div></div>
This info seems to be posted widely, except in this thread...<div></div><div></div><div></div><div>OrgName: seomoz.org ===== =====OrgID: SEOMOAddress: dotnetdotcom.org ===== =====Address: 93 S. Jackson Street 10070City: SeattleStateProv: WAPostalCode: 98104-2818Country: US RegDate: 2008-07-07 Updated: 2008-07-07 AdminHandle: NGE11-ARIN AdminName: Gerner, Nick ===== ===== AdminPhone: +1-206-299-9628 AdminEmail: admin @ dot net dot com .org TechHandle: NGE11-ARIN TechName: Gerner, Nick ===== =====TechPhone: +1-206-299-9628 TechEmail: admin @ dot net dot com .org</div><div></div><div></div><div></div><div></div><div></div><div>From: http://ws.arin.net/whois/?queryinput=seomoz.org [2008-10-21]</div><div></div><div></div><div>93 S Jackson Street is a PO Box service called Earth Class Mail.</div><div></div>
While I think the overall linkscape project interesting, robots.txt support should be a big priority. meta tags add page bloat AND are only considered after a page has been downloaded. Robots.txt solves the problem at the source. I would also suggest adding x-robots-tag support as an alternative to a meta tag, but this would be a low priority.As for the data sources, we can cross one off the list. External access to the ASK API was disabled 6 March 2007. At the time a contact in Asks Pisa research center told me they still had access, this may not be true today. Considering API limits and the overhead inherent in mass scraping, I think Donna has hit the nail on the head.Rand raises a very interesting point on using tools where the underlying data source is unknown. You run the risk of making dubious decisions based on dodgy data. I see many people enamored by the glitz of commercial keyword research tools (such as WordTracker) which use source data from Infospace meta engines such as dogpile. Yet ask yourself, when was the last time your target audience used dogpile? If you read Italian or want to test googles tranlate tool see this for more. The same is true for general web competitive analysis. Many cite Hitwise or comScore without realizing the data may be extremely misleading. Hitwise for the type of ISPs contributing (business vs consumer?), comScore for the sample selection method (spyware?). Regarding links, there there is only one dataset that counts in most markets – that discovered and processed by Google. At least when we use public sources such Yahoos Site Explorer or Exalead.coms link: syntax, we know a lot about their crawling habits (i.e. frequency, depth) allowing us to make informed decisions. I think Rands service would be more valuable if it were transparent. Yes others could copy him, but I think that is the case today. The competitive advantage could be in excellent execution and community goodwill rather than some sort of mysticism. I suspect most people would pay for the convenience of not having to do their own scraping etc.
Rands comment in Response to Donnas summation:@Donna - no comment officially, but I think your thinking is very smart.Its this kind of coy and nonsensical response that leaves people feeling like youre blowing a bunch of smoke. This isnt the CIA where people should need to decode your messages. Theres nothing here that youre doing that is rocket science or some kind of proprietary approach. Its called data gathering and data mining.If its your investors that keep you from communicating clearly and effectively with your customer base, I would suggest you need different investors. Theyre killing your reputation.Rand, I read that response from you to Donna and feel like screaming - thanks for wasting my time by requiring me to read an endless stream of your BS, when Donna Fontenot, who is merely guessing - was able to synthesize what youre doing in a few sentences. Can you understrand why peole are so pissed off at you?
@scottbowler & @dubibob - Both excellent suggestions that others made in this thread and elsewhere earlier and were including them in our dev priorities. Im hoping that by Q1 2009, well have a way to register your domain and see the link data we have for it free.@halfdeck - even with any form of blocking, well still be showing all the links that point to a given site or page. Blocking really just means that you wont show up as a link source when someone you link to is queried.@antezeta - the robots used by our sources all have UAs, and they all respect robots.txt, so you can block those bots if you so choose. And yes - we will continue to provide transparency about the size of our index, actually much more so than the major engines, showing domain diversity, lots of metrics and stats and URL numbers for the dataset.
@seanmag - In a business environment, theres trade secrets and competitive intelligence that needs to remain private. I recognize that ours frustrates some people, but I really feel that with tens of thousands of bots scraping and crawling the web, most for commercial and largely secret purposes, were actually far more transparent about what we do and give back more. Current bandwidth costs are so low that our crawling adds not even pennies to the annual cost of running most sites.I also take issue with the "endless stream of BS." Ive been responsive, direct, and honest in this thread and elsewhere. Yes, theres parts of this project were not revealing, but I dont think that lack of disclosure, particularly when Im so up front about it, is the same as lieing or BSing. I hope I can make it up to you and regain your trust and friendship.
randfish, please deny that seomoz is cloaking the bots of the sources you list. i dont give a shit if you own dotbot or another crawler on the list. you have yet to say that you are not cloaking well known bots. surprising, isnt it, that such sleaze still fits as a possible truth in your statements over the past week, and i have to actually ask.also, i puked a little bit in my mouth when you said earlier that you will do what you must to protect the integrity of your dataset. will you be able to criticize a search engine for their FUD pr double talk after this crap?
@corey - thats correct, we do not cloak bots or use any sources that cloak bots (although, I guess technically, the major search engines have hinted that they may have cloaked bots to help discover site cloaking and manipulation and we may pull from those sources).Regarding criticizing search engines - I think that having built a search engine (albeit a link search engine, not a content search engine), weve actually learned an incredible amount about web search, web crawling and the link graph which can be used to help answer (and make smart guesses about) a lot of questions weve had about search engine operations.
Ive been responsive, direct, and honest in this thread and elsewhere.Why? Because youre telling us youre honest? You never back up the claims youre put to task for Rand, how is it you honestly think people dont notice that?Its always "I wish I could tell you more" kind of responses. Whatever.
@Rand - Dont insult my intelligence. Ive been around long enough - in both the sales and marketing game, to know a BSer when I see one. Your responses have become quite predictable and I can actually see the point in this thread where you went from having some concern about doing damage control to actually relishing the attention youre getting.I suspect you feel that youre handling this damage control like a champ, but I have news for you. Youve completely screwed up the marketing of this product and youve lost a great deal of trust for your company and in your personal reputation in the process.Personally - despite my good feelings for your employees at large and the SEOmoz community as a whole - youve left me with a disgusting taste in my mouth with your approach that leaves me wanting to have nothing to do with SEOmoz.The only thing Im finding "transparent" about you at this point, is the transparency of your arrogant and condescending tone. Its really pathetic.
You keep playing this off like youre a seatch engine... No other search engine charges me $79 a month to use it. To me, it looks like youre holding my info hostage for my competitors to use against me. If everything was on the up and up, why wouldnt you offer the product as an "opt in" or "sign up" instead of taking my data without my permission? I know you wouldnt get a complete crawl as youd like, but at least itd be ethical.On a side note, I was a daily visitor to seomoz, and havent visited since this whole debacle occured.
@Rand People here seem to be asking for specific instructions to block inclusion in the linkscape program. Pointing to a page listing the most active crawlers on the web and saying block all of them (with some minor implications beyond linkscape)... well, this is non other than a polite way of saying you cannot exclude a site from linkscape. Id strongly encourage you to rethink this approach.Well behaved web companies have bots dedicated to specific purposes. Last time I counted, Google had 8 different bots, Yahoo! more than 10. I can give google and/or microsoft and/or yahoo textual web content while restricting image crawling. It isnt a question of opting in or out of Google, but a specific Google (or Microsoft or Yahoo!) service.Linkscape needs to decide what it wants to be: a well behaved, well understood service or a friendly rogue of the type nice to go out for a pint with, but better that the polite side of the family doesnt know. Im not sure there is a middle ground here. Do note this is not a critism of the linkscape service in of itself - Id just like to see a more transparent implementation.
Im going to take some time off from this, but will try to revisit in a few days if there are new questions or issues that need a response. Im certainly sorry to have lost the trust and respect of people that I really do like and respect (like Sean), but I think on the fundamental issues, there arent going to be any significant changes, at least in the next few months, as far as the product goes.Thanks for the opportunity to present our case and for the thoughtful feedback. We will be discussing everything that arose on this thread and others internally and with our board and if we have anything new to announce, will share.@antezeta - I think I missed your comment while posting. I do recognize that we are more of this roguish, agressive sort, and I think its one of the reasons theres a lot of hostility. As I said, were not changing direction on that now, but well definitely think long and hard about it. Thanks!
Personally, Im really glad to have this link data. And I am hoping that the index quality will improve and get more detailed.. more transparent... not get more ommitted and occulded. That being said, I understand how some people on the more secretive side of link building might want to cover their tracks and keep things off the radar.But If you work on big company sites with lots of inbound links... and your links are whitehat (and youre not paranoid)... it is an AWESOME tool.Its been useful in my work and I am glad to have it.
Ive been trying to avoid this debate as, honestly, Im getting a bit worn out on SEO drama lately, but Im starting to see 2 questions emerging out of this debate, and am beginning to see why Rand is finding himself in a bad position:(1) How do I get out of LinkScape?Perfectly fair question, IMO, and I completely understand how it raises alarm bells for some people. It seems like the answers havent been completely forthcoming, but a lot of that relates to...(2) How does LinkScape work?I think part of Rands evasiveness (not to put words in his mouth) comes down to this question. Frankly, SEOmoz has spent a lot of time and money to build a product, and is understandably resistant to telling us (including competitors, some of whom are feigning outrage, IMO) all of the secrets of how that product works and sacrificing competitive advantage.Unfortunately (1) and (2) cant be completely separated, and so here we are. The reality is that were going to all have to accept that this is a for-profit work-in-progress, decide how we feel about that product, and then get on with our lives. Meanwhile, Rand and company will have to work to improve that product and regain lost trust. I think many of the questions are fair ones, but the overall reaction is a bit over-the-top - frankly, Im glad my business decisions arent under the kind of lens that Rands constantly seem to be.
Regarding criticizing search engines - I think that having built a search engine ...<div></div><div>It appears that "having built a search engine" has put SEOmoz the same double-speaking, lack-of-transparency, randomly paranoid phase that all the major search engines seem to go through, and that all of them still have ingrained into their culture to some degree.</div><div></div><div></div><div>In other words, SEOmoz is exhibiting the same kind of behavior that keeps the "content search engines" from releasing more link data. Which is why SEOmoz had to go buy their own index.</div><div></div><div></div><div>Funny when you think about it, sickening when you watch it happen to someone you thought it wouldnt happen to. :(</div>
h8 to join this trainwreck, but as XKCD said "There is someone WRONG on the internet". If SEOMoz get data from a source within that sources rules, I dont see why they need to allow you to opt-out of their conglomerated "index" - they have a right to said data, they can have it, too bad so sad. I agree this isnt an SE: its an intelligence tool. Hitwise dont let you opt out, either as a user of an ISP or on a site level, and that stuff is way scarier. Ditto trends and about 4 trillion other tools. That is the way "market intelligence" stuff works, in that the contract exists seperate to the website owners. Different if a crawler exists and they use the crawl data in some way - not because that takes bandwidth tho, but because they are obtaining data in a, if not unethical, certainly not 100% polite way if there is no way to block the bot. For those who care (and I dont), you can ban all bots trivially, by making the last command: User-agent: * Disallow: / After all webmaster useful bots (e.g. put in a line that allows Google etc above it). Call me silly, but if you DONT block useless bots, thats your problem, because you know that we are living, in a robots opt-out world (and I am a robots out-out girl). Forget Dotbot or whatever, this is the way it should be done adn ALL robots should respect robots.txt. Personally, Im more "three thousands words, three blog posts" mad @ Google for the whole "we can opt out of trends but you cant" stuff than this tool, which far fewer people will use. Focus the rage where it best fits, IMHO.
Rand said: <div></div>I do recognize that we are more of this roguish, agressive sort, and I think its one of the reasons theres a lot of hostility. As I said, were not changing direction on that now, but well definitely think long and hard about it. <div></div><div>So why didnt you just say that at the beginning instead of pretending you had something different than an agressive bot that we basically cant stop from scraping our information whether we like it or not? </div><div></div><div></div><div>At least it would have been an honest answer and this would probably be a non-issue now. Plus, you could have saved us all a lot of reading, and yourself a lot of wasted time.</div>
@Sean - You rule. As a long time SEOMoz attender, I also feel sick to my stomach.@Rand- Rand Said--- I do recognize that we are more of this roguish, agressive sort, and I think its one of the reasons theres a lot of hostility.Your job as an SEOMoz employee is to difuse the situation, not go to the fire with gasoline. Negative generalizations are doing nothing but ruining this for you. I was sort of supportive of you when you pulled this same tactic at John Andrews blog and then with Michael Vandemars blog as Rand could not be a bad guy right? Well I guess heroes can be villains too (yes that was an analogy from the NBC show Heroes). I donot know how much more civilized this roguish agressive sort SEO can get with you at this time. Shame on you.
Alot of peoples toes seem to be stubbed here thanks to LinkScape and all of that jazz surrounding it. The bot mystery, not being able to opt out then the whole issue with the meta tag branding, and now everyone attacking Rand for being secretive. From my perspective, Rand isnt disclosing everything concerning this project, that would sink SEOMOZ, who in their right mind would just throw everything youre doing on the table, Would you ask Google to disclose all of their information and tactics? I dont feel like this was meant to be such a fiasco, but maybe Im wrong too. I support SEOMOZ, and used many of their tools in the past. I respect Rand for explaining and trying to manage all the negative heat coming off of this. Ive said many times before, I dont support that the scrapers dont care about Robots.txt and the exclusion method is using a meta tag, but I do support the sentiment behind LinkScape.
holy shit, i should have made the bet with madhat about this breaking 100 comments. SHIT!
Whats the deal with everyone flaming Rand for; the service is what it is <b>deal with it</b>. Oh Im so distraught at what seomoz has done, Ive been such a loyal follower for soooo... many years. Are you F@#$#@ serious???