Published: Nov 13, 2007 - 04:48 pm
Story Found By: DazzlinDonna 1652 Days ago
Category: Vertical Search
13 Comments
13 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Comments
This topic has been around a while (http://sphinn.com/story.php?id=5136) I cant believe that Microsoft has been silent after than initial communication by msndude at WMW. Most dont believe it was a real dude from MSN anymore due to the lack of communication since.Sorry for the link drop, but I have a strong opinion on this (2+ months ago and counting):http://www.exposureonline.com/2007/09/Microsoft-is-lying-and-screwing-up-your-log-files.cfm
Tim - a) you should edit the post and insert an extra space, currently the close parenthesis breaks the link, and b) It is about so much more than just screwing with the log files, although of course thats a biggie as well. Did you see my post on it by any chance?
Hi Michael, thanks but I guess its too late to edit that link...Yes Ive read your excellent post on this topic too. Im glad this topic is finally get some attention around here! Hopefully msn has an msndude reading Sphinn if they are ignoring WMW.And I know you arent a newcomer to this issue -- Ive read your posts in the thread at WMW too.I appreciate that this is about more than just log files, but Im not an AdSense user and I consider referral spam (which ends up in logs) abhorrent thats my biggest complaint.
Heh. Stevewar... Im not sure you understood the complaints here... did you read the post?
Burgo, I think its a given that he did not. :P
Just glancing at it, Id have to agree it does seem to be an attempt by MS to do cloaking detection. Their bot spiders a page, then another one pretending to be a browser comes along a little while latter. They probably do a textual comparison of somekind (to account for slight variations such as a timedate stamp on the page). If there is a significant deviation the page may be flagged for human investigation.This is complete speculation, but it is my initial guess at what is happening. It makes perfect sense theyd want to do both page reads within close temporal proximity, btw, as they would be the only way to minimize changes that would occur on dynamic sites -- people posting blogs, articles, etc.
Oddly enough btw, if its messing with your stats/ad revenue etc, I think perhaps the only way to mimimize the effect may be to selectively cloak the page. ;-)
Ive been investigating what MSN is up to as well and I can tell you that whatever they are doing, looking for cloaked pages and such, is apparently working quite well. Just yesterday I compared the results of (attempted) scraped content on nefarious sites, cloaked affiliates and malware sites, which both Yahoo and Google had indexed abundantly, but were all missing from MSN.MSN had the cleanest SERPs of the 3, and it didnt used to be this way, so whatever theyre up to they should just keep on going because Yahoo and Google are way behind in this security sweep of their indexed sites.
Even if its cloaking detection (and I think thats likely), its easy to game. The IP addresses, the referring URL features (I mentioned those in the post + someone in the comments spotted something), and the unchanging UA all give away the hits. Any blackhatter worth his weight would be able to figure it out (as many have).Its a race I guess, but its one MS most likely will lose in the long run. Likewise for G and Y if they try such a technique.Pierre
I doubt anybody here serves msnbot different content than the referrer spam bot. Even if theyd change UA etc. daily, that would be detectable. So if theyre really after cloaking, that doesnt fly.
Its not exactly cloaking theyre after, it appears theyre testing to see whether your site responds to different keyword stimuli which the bad sites do.FWIW, how do you know they arent running a similar test outside of MS IP addresses? If they used a random set of IPs most people would never notice.Thats how magicians do their job, its all distraction, you look where they want you to look while some slight of hand happens elsewhere.
I thought about that IncrediBILL. Im designing test and analytics algos to try to detect that ;)One step at a time.
@incredibill: Id say youre RIGHT ON about that (testing for keywords). Otherwise theyre used some other random referer. Thats the only good reason to pass along search engine keywords like that.@ekstreme: BTW. No idea if PHP will post on sphinn, but the following probably will test against it.HMM. (NOTE: Im just a programmer/web developer - Not a blackhatter. Just guessing about this.)$ipclass=65.55.165;$ua=Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322);if(!(isset($_SERVER[HTTP_USER_AGENT])and $_SERVER[HTTP_USER_AGENT]==$uaand substr($_SERVER[REMOTE_ADDR]),0,strlen($ipclass))==$ipclass)){// stuff here is only shown when bot isnt seen}