Topic Type: News Story (Jump to http://www.sitepronews.com)
Category: SEO
5 Comments
5 Comments
Save the date for:
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX Sao Paolo - Brazil - (Aug. 7-8)
SMX China - September 23 & 24, 2008
SMX Stockholm - September 23 & 24, 2008
SMX East - NYC - (Oct. 6-8) Registration is now open.
SMX London - November 4 & 5, 2008
Comments
If you don't know what LSI really is but you don't mind pretending you do then this is the kind of article you might write.
Statements such as "Google, in fact, implemented LSI into its algorithm a few years ago and has continued to use it since", "The best way to discover these semantic relationships is to perform a search of Google with the tilde (~) character in front of your query" and "This is especially true because Google uses LSI to evaluate the relevancy of your website's link profile" are just plain wrong.
LSI is a complex mathematical operation and I agree it is difficult to understand but let's not fuel the myth that search engines use LSI. To do so simply encourages the snake oil salesman to peddle their "LSI software" and other "LSI optimized" products and services to unsuspecting punters.
I have tried to redress the balance with a layman's explanation of LSI http://www.seo-blog.com/latent-semantic-indexing-lsi-explained.php and a follow up post on the LSI Myth http://www.seo-blog.com/latent-semantic-index-lsi-myth.php
Thanks for the follow up, duz.
You might even leave a comment like this if you're a pompous and arrogant SEO clown who thinks he knows far more than he does but is generally confused as to the reality of the situation. If you would have done even five minutes of research before polluting the forum with your "snake-oil" diatribes you'd have discovered that Google does use a derivative of an LSI system in its algorithm that considers the relationships between words within content of the site itself (see http://www.seobook.com/archives/000657.shtml) but clearly "duz" or "dunce" (not sure which he meant) would have found the following which SEO Book confirms (a far more reputable source than mr. duz i might add):
- search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target terms
- pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized)
- pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords
Dr. Ralph Wilson (http://www.wilsonweb.com/seo/google-lsi.htm) agrees, see this post for more information, or you can take Google's word for it by reading the patent they applied for (http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=1&p=1&f=G&l=50&d=PG01&S1=20060018551.PGNR.&OS=dn/20060018551&RS=DN/20060018551), which mr. duz obviously didn't (although he did claim to). It's quite long, but perhaps the most important portion is the following:
The system is further adapted to identify phrases that are
related to each other, based on a phrase's ability to predict
the presence of other phrases in a document. More specifically,
a prediction measure is used that relates the actual co-occurrence
rate of two phrases to an expected co-occurrence rate
of the two phrases. Information gain, as the ratio of actual
co-occurrence rate to expected co-occurrence rate, is one such
prediction measure. Two phrases are related where the prediction
measure exceeds a predetermined threshold. In that case, the
second phrase has significant information gain with respect to
the first phrase. Semantically, related phrases will be those
that are commonly used to discuss or describe a given topic or
concept, such as 'President of the United States' and 'White
House.' For a given phrase, the related phrases can be ordered
according to their relevance or significance based on their
respective prediction measures.
Hopefully this will clarify the issue for those of us with open minds, and maybe next time mr. duz will do a little research before badmouthing others and making wisecrack comments that only make him look more foolish when corrected
Just four simple points nyorchak.
1. You said in your original article "Google, in fact, implemented LSI into its algorithm a few years ago and has continued to use it since". Now you are saying "Google does use a derivative of an LSI system in its algorithm". Sounds like you have changed your mind.
2. You cite Aaron Wall's article as a reputable source of evidence http://www.seobook.com/archives/000657.shtml. This article's introduction currently includes:
"Some of those well in the know attribute this to latent semantic indexing. Even if they are not using LSI, Google has likely been using other word relationship technologies for a while, but recently increased its weighting".
When first written this used to read (courtesy of Internet Archive) "Some of those well in the know attribute this to latent semantic indexing, which Google has been using for a while, but recently increased its weighting".
It seems to me as if Aaron has had second thoughts like you, however 'reliable' sources are not normally edited without comment.
3. The Google patent you quote (and the other related patents) do not use the term LSI. If you cannot understand the difference between the Google patents and LSI then I cannot help you.
4. Invoking my surname in a derogatory way as you have done signifies a middle school mentality and hence I have no intention of communicating with you further.
Okay Duz, I appreciate your four "simple" points, however I must ask you the following:
If you do not consider Google and other search engines word relationship technologies a derivative of LSI, what exactly would you call them? Seems that all the evidence states that they are the same in almost every way, from form to function. Just because the moniker LSI is not used does not mean the system is not doing exactly what it states in the patent. Google has never been one to divulge its algorithmic personality or drop names so to speak, so let's not get hung up on the wording.
Instead, let's discuss whether the strategies I've discussed in my article are advantageous to SEOs considering the evidence indicating the presence of word relationship systems and indexing tools within the major engine's algorithms, whatever name they may have.
Just curious, do you not consider SEO Book a reliable source of information? Last time I checked I had heard of Aaron, but I hadn't heard of you. I just can't help wondering what the Google patent is referring to if it's not semantically indexing the content of pages, which it clearly states it is. So if not LSI, what would you call a system like this? Either way, the content of my article is relevant and accurate considering the evidence I've seen (and of which you have yet to provide to the contrary with the exception of your own opinions) so like I said earlier, let's debate not whether the term LSI appears but whether strategies based around strengthening word relationships within page content are beneficial for SEO.