Sorry this site requires JavaScript to be enabled in your browser. See the following guide on How to enable JavaScript in Internet Explorer, Netscape, Firefox and Safari. Alternatively you may be blocking JavaScript with an advert-related or developer plugin. Please check your browser plugins.

LSI is an old, tried and tested technique but it has quite important limitations. PLSI is being preferred over this in IR research, and although it uses LSI it addresses some of its issues, and is therefore better.
Comments7 Comments  

Comments

Avatar
from barnaseo 1194 Days ago #
Votes: 1

Hi Marie-Claire, thanks for Sphinning this. I´ve got basic knowledge of IR concepts/theory and would love to read more IR-related posts on Sphinn. Unfortunately I didn´t understand much of the second half of your article since it goes quite fast and references to IR and mathematical concepts that - I think - the avarage SEO is no expert in. Maybe you could introduce us SEO´ers into some of the more advanced IR practices? :)

Avatar
from theGypsy 1194 Days ago #
Votes: 0

Of interest, the gang at Google did mention PLSI about a year ago when getting excited about HTMM - http://googleresearch.blogspot.com/2007/09/openhtmm-released.htmlSo it’s certainly on the radar over there.....

Avatar
from Misscj 1194 Days ago #
Votes: 0

Hi Barnaseo, yes I’ll see what I can do about that, watch the blog, more coming up - PLSI is just quite complicated anyway so I tried to keep it brief, but I agree, it’s not easy.theGypsy, nice link, thanks.

Avatar
from barnaseo 1194 Days ago #
Votes: 0

@misscj: Great! Looking forward to it.

Avatar
from theGypsy 1194 Days ago #
Votes: 0

Np... that was when it first really came on my own radar... of interest as well is the phrase based indexing and retrieval stuff Google also was playing with...(when Anna Patterson was there)... Keep up the great work!

Avatar
from johnandrews 1194 Days ago #
Votes: 1

Good work writing about the complex topic but I want to question why we write about it. LSI is a processing method. It is interesting to technicians building IR (information retrieval) systems. It is important, but not practical for search engines on the web at this time. Because it is important, we should care about how it works, and specifically about what it assumes, what data it uses, what conclusions it reaches... which support an understanding of what its limitations are. All good stuff for undertstanding search and doing SEO.But we must assume that LSI is not implemented as LSI rightnow due to practical issues. The algo in use might be based on LSI principles, but the implemented algo will make it’s own assumptions, and have it’s own sensitivities and make it’s own conclusions. So practically speaking, we care more about what is actually implemented than the details of LSI, but we care more about understanding LSI concepts than we care abut the various labeled variants of LSI that people are playing with. In academia, they are forced to label their modified versions with a new name (like PLSI) in order to get fame and tenure etc. But if Google is not using this variant, it is irrelevant to us in SEO world. BREAKTHROUGHs are relevant, however, because they represent major improvements that search engines will use... and so they become as important to us as LSI. Making statistical estimates to improve the efficiency of LSI is not breakthrough.. it’s normal progress to try and estimate everything that takes time to calculate. Until someone shows that an estimation method is fast, accurate, and reliable (i.e. robust), it is just another labeled variant useful to academics in IR.So good article, but I disagree with the premise of it: that LSI is unimportant to us compared to PLSI. I think it is more important for us to understand LSI, and then we simply need to recognize that PLSI is a variant, not yet implemented or testable (an incremental evolution).I think it funny the final analogy stumbled on this as well... "Saying that a search engine is using LSI is like saying that a car is using petrol/gas". Yes, well, cars are using petrol/gas today. The vast majority. And they will, despite the existence of ethanol, hydrogen cells, biofuels, etc. all of which are worthy of recognition but not yet relevant to drivers due to practical issues.

Avatar
from Misscj 1193 Days ago #
Votes: 0

Hi John,"In academia, they are forced to label their modified versions with a new name (like PLSI) in order to get fame and tenure etc." - my advisor told me "if you want to be famous, make a corpus" :)I wrote about LSI because a lot of SEO blogs, tweets, you name it mention it.  I wanted to illustrate that it is a very common thing to use if you’re building a search engine and that another vairant existed anyway, which was better, so if anything, Google would use that rather than plain old LSI.  However - nobody uses it in exactly the same way.  Us academics like a bit of a banter about things that are slightly different but almost the same.Thank you for your valuable comment John.  

Upcoming Conferences

Search Marketing ExpoSearch Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.



Join us at an upcoming SMX event:

Upcoming Webcasts

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include: