Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

SEOmoz released its new LDA tool on Tuesday to an enthusiastic crowd at its annual two-day seminar in Seattle. How can it help you? Patti Woloch gives 10 ideas for using LDA data to improve your site's rankings.
Comments12 Comments  


Avatar Administrator
from dannysullivan 3057 Days ago #
Votes: 3

I'm sorry. Respect to SEOmoz and Ben, but he's "reverse engineered the search ranking algorithm." The one that both Google and Bing use? No, they use different ones. Either of them? I doubt it.

Pick any site, please -- pick any one on Google. Use the tool, and if you don't come back with 100% figures, I guess that's it.

I feel like it's 1998, and WebPosition has just rolled out its tool designed to help people make the "perfect" pages for search ranking. Except even if you followed all the instructions, that didn't work.

Avatar Moderator
from Jill 3057 Days ago #
Votes: 0

To be fair, Danny, it sounds like (and I've only just read this article) it's not much different than what the BruceClay or Hubspot tools do.

I'm not a fan of any SEO tools such as those for the reasons you just stated.

from MichaelCottam 3057 Days ago #
Votes: 0

The news isn't the tool...the news is that Ben has shown that a stronger LDA score for a given search term + page content is a ranking factor as or more important than # of unique IPs (or domains as well) linking, in terms of correlation with Google search results (not Bing, just Google).

And read the article carefully, don't just skim it....what was said was "reverse engineered the search engine ranking algorithm as it applies to content relevancy".

What did we all think was "the key" to relevancy before, in the Google algo?  Page title, some sort of keyword density value, anchor text of course.

But we all can throw up many examples of pages that rank very well based almost exclusively on links.  At least I could a couple of years ago, now that I think about it.

Where this applies in real SEO work is this: LDA value explains in many searches why one page outranks another that has a slightly stronger link profile, and similar on-page content.

From just my scratching around with sample searches and looking at the tool's scores, it seems that it's about as strong a factor as # of unique domains linking, but not as strong as a well-aged exact match domain name.

Avatar Administrator
from dannysullivan 3057 Days ago #
Votes: 2

I did get the "content relevancy" part. But given that links far outweigh on the page content, so what?

Google has over 200 different factors. Page titles. Content on the page. Use of bold. Links to a page. Quality of those links. Age of domain. Trust of that domain.

The linkage and trust of a domain continues to be what most people find trumps every other factor.

In terms of knowing whether one page has a "slightly stronger link profile" than another, according to what? Google doesn't report all the links to a site. It certainly doesn't report the quality of those links.

If you're talking about SEOmoz's tools, those are simply its own guesses about what it thinks Google might be doing. Best guesses, which remain just that.

If you find that you have 65% relevancy or whatever, according to this tool -- what, if you pump up to 70%, that will outdistance your competitor? Maybe, if all things are equal, and the tool is really that accurate.

But then again, it might not do anything at all.

What I suspect it will be useful for is the same thing these types of tools have always been useful for, helping you understand that you aren't ranking for some terms because you simply might not be talking about them in the way you thought.

And that's fine. But that's a long way from reverse engineering anything.

from randfish 3057 Days ago #
Votes: 0

I love the passion in the article, but I'd ask that we have until our public release on Tuesday to explain what it does, how it calculates, the models, math, etc. I won't try to address criticism until then.

Danny's certainly right in that our work and tools, just like all things in the SEO sphere, are a "best guess." We like to use statistics and correlation to help validate, but that doesn't mean it's exactly or even necessarily similar to what the major search engines are using.

In this case, we can be almost sure that Google's vector space models are more complex and scalable than ours. However, we think (and will try to show on Tuesday), that even having a simplistic model can be very valuable for doing SEO.

from AndyBeard 3057 Days ago #
Votes: 0

People are shown a new toy, and you introduce price scarcity/urgency (last day to buy for $79) and expect people to wait on reviews?

Avatar Administrator
from dannysullivan 3057 Days ago #
Votes: 0

Look forward to reading more about it, Rand. I know people heard his talk at MozCon, but only assorted tweets really came out of that -- and the page itself is like a mystery form!

from randfish 3055 Days ago #
Votes: 0

BTW - Looks like Dana Lookadoo posted some coverage of the session here - That's not the post we've been working on internally though (and we didn't realize it was going up). Still planning to have more detail out Tuesday.

from theGypsy 3055 Days ago #
Votes: 1

Man, I take a few days away and this breaks out? Is this the LSI/A for the new generation?

While I applaud the effort, why do we need things like 'reverse engineering THE algorithm'? I have been talking/writing about the possibility of LDA over at the Goog' since 07 or so. BUT I have also discussed that the research team also seems to have an interest in HTMM and PLSA. I watched this area like a hawk and have seen no reason to believe either/any of them have been implemented. Or if indeed, some hybrid approach to semantic analysis was in play.

And also curious over the years (while the LSI bandwagoneers shilled thier crap) was that NO ONE seems to have caught on to the one major technologoy that we actually KNOW was in play, or at least was far more likely to be; Phrase Based Information Retrieval, (PaIR). This particular semantic analysis method had 9 patents out on it at last count. This tech was brought into Google back in 2004 when they hired Anna Patterson and took over her patents in the space. So, why would we look at LDA, (only mentioned ONCE by Googlers as far as I know)

I have to believe that if anything, we'd be better off looking at something like PaIR. Why? Because over the years searchers have become more experienced and PHRASES or 2-3+ word terms are used. I would also surmise that PaIR combined with PLSA/LDA would ultimately provide a better model. But who knows. I can talk to 5 different IR folks and get 5 different opinions... Yes, the IR world is a lot like ours... hee hee... no one ever agree's.

So, once more. I appreciate the effort that has gone into the tool. BUT if ya'll (Rand and Co) start to pimp this as some SEO Magic Bullet for sus'ing out any search engine's approach. I will be less than pleased. You can pull the wool over a lot of folks eye's because they know jack about IR - but not this geek. It is a toy to gain some insights. Which is great. Just let's not over-market it as anything else, m'kay?

For the rest 'o ya that aren't up on all this jargon... I really do suggest reading up on the phrase based IR stuff as I would likely put money on it being the core of Google's semantic analysis approaches. Want more on the other? Here;

Latent Dirichlet allocation

Hidden Topic Markov Models

Probabilistic latent semantic analysis

Phrase Based IR

There's my 2c for now... I have a feeling we'll be getting into this more during the coming week. And really, I do love this because it shows, once again, how important understanding IR can be to SEOs... We must be able to understand these concepts to ensure we can make realistic observations in such situations.

from randfish 3053 Days ago #
Votes: 0

New post is up here  - - regarding your points David - we certainly agree it's possible and in fact likely that Google's doing something more advanced thatn the naive implementation of LDA we've built. The correlation numbers, though, suggest there's potentially lots to be learned and done with SEO against tools that leverage topic modeling.

In fact, we discussed that it was likely an even more basic system, like LSI (or maybe pLSI) could produce similarly correlated numbers (though perhaps not quite as high). The primary point being - topic modeling and getting more serious about producing software that can measure/recommend based on it could have positive returns for practitioners.

Please do drop me a line if it's something you're interested in working on - we've got research budget for this type of thing. :-)

Made Rand's link clickable - Jill

from theGypsy 3053 Days ago #
Votes: 0

Thanks Rand... I had a good look at the tool and some other bitties from the vault this evening and shall put something up shortly. All in all, I think it's a great thing if we educate more search folks in the dark arts of the black box (you know, Google).

In short, you did well enough on not hitting the 'reverse engineering' bits and that was always, as stated above, my main concern. You know me, from Slawski University where we learn 'may be' 'could be' and 'possibly' as the conrnerstones of analysis. Ya know?

All in all... just happy to see such topics arise.

As for ideas/research/refinements, I will certainly try to poke around more, talk to some my own IR geeks, and let ye know if any ideas/questions/thoughts arise along the way.

from p1rebeccal 3053 Days ago #
Votes: 0

Without taking anything away from Patti's excitement, because frankly we are all excited about it over here since our niche is extremely competitive and in many cases a surprisingly level playing field, I think the point here was more to help answer the "Great, now what do we do with it?" question that came up during Ben's session at the Mozinar than anything else. I've been keeping tabs on the controversy all weekend, and I've attempted to address some of the more common criticisms in a follow up post here. (I've also added a link to Patti's post back to Rand's official explanation.)

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

Join us at an upcoming SMX event: