neyne
yeah it is nice to get reminded that not everything is allowed when link building. one should have some lines that shouldn't be crossed.
Story: SEO Is Mostly Quack Science
Rand not everything that you find in Google Scholar is considered to be a scientific paper. For example, one of the results in that SERP that you cited is authored by one Aaron Mathew Wall and its title is "Search Engine Optimization Book". Sounds familiar?
But let’s take a look at some of the research papers that are found in that GScholar SERP:
For example, this one, dubbed: "Web spam detection via commercial intent analysis". If you get to the PDF (either by having an academic institution access or by paying for a subscription), you will find clear references to the dataset that was used (Webspam-UK2006, referrence #5) and a citation of the paper that describes the methodology used in the analysis (Castillo et al., reference #6), both of which you haven't done (at least not initially).
As for the peer review process, the paper was presented at 3rd International Workshop on Adversarial Information Retrieval on the Web, held in Banff, Canada in 2007. These workshops are organized by AirWeb as a part of their WWW events. AirWeb has a board of directors and an advisory board that reviews all the papers submitted for their conferences. For example, if you go to the Call for Papers page of the WWW2009 conference that was held in Madrid, you will find the following quote:
All papers will be peer-reviewed by at least three reviewers from an International Program Committee. Accepted papers will appear in the conference online proceedings published by the ACM Digital Library and the conference's web site. Authors of accepted papers will retain proprietary rights to their work, but will be required to sign a copyright release form (pdf file) to IW3C2. The Program Committee will select a small number of excellent papers for fast-track journal publication in the ACM Transactions on the Web (ACM TWEB).
Another paper that I found in that same SERP, "Towards agents participating in realistic multi-unit sealed-bid auctions" , was presented at the conference for which the board of directors of IFAAMAS performs the review. So you are wrong when claiming that scientific work in our (and related) fields is not peer reviewed or that it doesn’t provide references to raw data and methods.
Now to the point of the way you presented your data and methods. The only proof you have provided that your data is statistically significant is the mention that your stderr is very small and your claim that you are "rather confident in the robustness of our data" (paraphrasing). It took almost two months from the time I first objected to your correlation studies till you (under mounting pressure) provided a glimpse into the methods of your analysis, by revealing that your stderr is a standard error of a mean and not a standard error of a correlation coefficient.
The moment you provided that fraction of a description of your analysis method, your study was taken apart by an actual PhD in IR, dr. E. Garcia, someone you have personally referred as "one of the world's foremost authorities on the subjects of the IR (Information Retrieval) field & search engine technology" (your words from an interview at SEOMoz). He disputed your method of analysis, your calculations and your conclusions. Hell, he disputed it to such extent that he demanded you to publicly retract your publications using these methods and data. Retraction of an article is not something that is easily demanded in the scientific circles.
Rand, it was you that insisted on calling your research "scientific". I am sure that in the marketing world, giving your product an appealing title works perfectly. However, in the scientific research world, you must provide a bit more "meat" than just sweetened rhetoric. I suggested Ben to submit this research to a scientific paper, not because I think your results are worthy of publishing, but because that would mean that it would undergo a real peer review process. After reading  dr. Garcia’s review, I am not sure it would be approved for publication. And there is the real difference between the world of scientific publishing and web marketing – peer review process prevents bad science from seeing the light of the world, while in web marketing (or any other writing published online) anything can be published and only then (when authors deem it worthy of their time and effort) is the data presented for peer review.
I don’t have a personal problem with you or with your company. Hell, I pay you good money to be able to use your tools. You are obviously a very good marketer and a successful businessman. The fact that you have a large community of people behind you, ready to support you even when you are wrong, is admirable. But you are not a scientist and Ben is not a statistician. He seems like a very nice and smart guy and I am sure he is a great programmer. Hell I am certain he is a much, much better statistician and mathematician than I am. However, this is not about him or you or me or Dr. Garcia. Extraordinary claims require extraordinary evidence. Your claims were extraordinary not only by their nature but also by the way they were presented. Your evidence turned out to be somewhat less spectacular.
Story: SEO Is Mostly Quack Science
Rand. as Michael said, we would need the raw data. Not the table of data represented in the graph.
However, from the attached table, it seems like you calculated a mean of correlation coefficients for each SERP and then calculated a stderr of that mean relatively to the population of correlation coefficients you have measured. I will further investigate whether this is the correct way of assessing the significance of correlation
Story: SEO Is Mostly Quack Science
I have been all over this discussion on every possible blog/stream/conference. I just want to add a few points, questions:
As to more general comments that I have read recently all over the place regarding the SEO not being scientific, I have one word for ya - duh! Of course SEO is not science. No one ever claimed that SEO is science. It is a marketing branch. Just like architecture is not (only) science. It, however, includes some science. IR is science. Statistics is science. Psychology is science. SEO can make use of some of the scientific tools to improve their results and their data acquirement methods. Hairdressers are advised to know the chemical activity of the peroxide solutions they use for hair-bleaching, or they could burn the customer's scalp. Does that acquaintance with chemistry make hairdressing a scientific field? No, but they should by all means make use of valid scientific branches and research to improve their craft. The same way that science does not make new buildings beautiful, but architects still have to know scientific principles of engineering to be able to plan beautiful, but safe, buildings.
Another claim that I hear a lot is how we cannot apply scientific methods to SEO, since Google is a "black box with hundreds of unknown parameters" so we don't really know what is happening.
You think Google is a black box? Try E. coli. This bacterium has 4400 genes, function of many still not known to us. These genes replicate, jump within and outside the genome, mutate, produce proteins which themself combine and recombine, activate and inactivate under different environmental conditions, produce synergetic and reductive effects with millions of combinatory interactions. Actually, compared to E.coli, Google is a very simple machine. Yet that level of complexity has not stopped us from investigating, learning, deciphering and understanding complex molecular processes that go on within the membranes of those bacteria. And the humanity has benefited greatly from such studies, advance which would have not happened if the complexity of the model organism was taken into the account.
I am not comparing SEO to molecular biology here, but the principle holds true. Our nature is to poke and investigate that which is unknown. Thousands of years of human history have proven that the scientific method is the best suited and most successful in performing those investigations. If we give up the curiosity and the proven method to satisfy it, there is not much left preventing us from becoming a mindless mass of sheep governed by false claims, popularity contests and principles of form over function...
Over and out.
Hmmm
The fact that you see elements of your site (or the data contained within them) being scanned and showing in Linkscape in a direct correlation with DotBot's crawls means that on that particular crawl, Linkscape got your pages through the DotBot crawl. On the next round it may get it from some other crawler.
Did blocking Dotbot completely remove your site as a link source from Linkscape ?
Hmmm
The fact that you see elements of your site (or the data contained within them) being scanned and showing in Linkscape in a direct correlation with DotBot's crawls means that on that particular crawl, Linkscape got your pages through the DotBot crawl. On the next round it may get it from some other crawler.
Did blocking Dotbot completely remove your site as a link source from Linkscape ?
ah now that you put it that way :)))))
they just seem to keep spilling that feces bucket into the fan opening....
Annie , this is exactly what was discussed when Linkscape came out. They didn't want to say what the useragent of the data gathering robot was because it would reveal that the data was partially bought and that the robots were not theirs. Sh*t hit the fan and Rand came up with the metatag NODMOZ (or whatever it was). Heh, that is easy, they parse the head anyways so they could just exclude the data with the tag from the analysis and not from indexing.
In any case, I don't think the way they are pushing it is worse in its inaccuracy than a lot of other products that are being pushed on us. How many seconds does it take for Win7 to load?
A lot has been said about the herd mentality of the people supporting Rand and SEOMoz, it would be funny if the Rand criticizing crowd started behaving that way too... Let's not go there.
Annie , this is exactly what was discussed when Linkscape came out. They didn't want to say what the useragent of the data gathering robot was because it would reveal that the data was partially bought and that the robots were not theirs. Shit hit the fan and Rand came up with the metatag NODMOZ (or whatever it was). Heh, that is easy, they parse the head anyways so they could just exclude the data with the tag from the analysis and not from indexing.
In any case, I don't think the way they are pushing it is worse in its inaccuracy than a lot of other products that are being pushed on us. How many seconds does it take for Win7 to load?
A lot has been said about the herd mentality of the people supporting Rand and SEOMoz, it would be funny if the Rand criticizing crowd started behaving that way too... Let's not go there.


Story: Pakistan Floodbait: The End of The Info-Graphic