- 36
- Sphinn It!
Posted By: DazzlinDonna 297 days ago
Topic Type: News Story (Jump to http://ekstreme.com)
Category: Microsoft Other
13 Comments
13 Comments
Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct.
6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.
Comments
This topic has been around a while (http://sphinn.com/story.php?id=5136) I can't believe that Microsoft has been silent after than initial communication by msndude at WMW. Most don't believe it was a real dude from MSN anymore due to the lack of communication since.
Sorry for the link drop, but I have a strong opinion on this (2+ months ago and counting):
http://www.exposureonline.com/2007/09/Microsoft-is-lying-and-screwing-up-your-log-files.cfm
Tim - a) you should edit the post and insert an extra space, currently the close parenthesis breaks the link, and b) It is about so much more than just screwing with the log files, although of course that's a biggie as well. Did you see my post on it by any chance?
Hi Michael, thanks but I guess it's too late to edit that link...
Yes I've read your excellent post on this topic too. I'm glad this topic is finally get some attention around here! Hopefully msn has an msndude reading Sphinn if they are ignoring WMW.
And I know you aren't a newcomer to this issue -- I've read your posts in the thread at WMW too.
I appreciate that this is about more than just log files, but I'm not an AdSense user and I consider referral spam (which ends up in logs) abhorrent that's my biggest complaint.
Heh. Stevewar... I'm not sure you understood the complaints here... did you read the post?
Burgo, I think it's a given that he did not. :P
Just glancing at it, I'd have to agree it does seem to be an attempt by MS to do cloaking detection. Their bot spiders a page, then another one pretending to be a browser comes along a little while latter. They probably do a textual comparison of somekind (to account for slight variations such as a timedate stamp on the page). If there is a significant deviation the page may be flagged for human investigation.
This is complete speculation, but it is my initial guess at what is happening. It makes perfect sense they'd want to do both page reads within close temporal proximity, btw, as they would be the only way to minimize changes that would occur on dynamic sites -- people posting blogs, articles, etc.
Oddly enough btw, if it's messing with your stats/ad revenue etc, I think perhaps the only way to mimimize the effect may be to selectively cloak the page. ;-)
I've been investigating what MSN is up to as well and I can tell you that whatever they are doing, looking for cloaked pages and such, is apparently working quite well.
Just yesterday I compared the results of (attempted) scraped content on nefarious sites, cloaked affiliates and malware sites, which both Yahoo and Google had indexed abundantly, but were all missing from MSN.
MSN had the cleanest SERPs of the 3, and it didn't used to be this way, so whatever they're up to they should just keep on going because Yahoo and Google are way behind in this security sweep of their indexed sites.
Even if it's cloaking detection (and I think that's likely), it's easy to game. The IP addresses, the referring URL features (I mentioned those in the post + someone in the comments spotted something), and the unchanging UA all give away the hits. Any blackhatter worth his weight would be able to figure it out (as many have).
It's a race I guess, but it's one MS most likely will lose in the long run. Likewise for G and Y if they try such a technique.
Pierre
I doubt anybody here serves msnbot different content than the referrer spam bot. Even if they'd change UA etc. daily, that would be detectable. So if they're really after cloaking, that doesn't fly.
It's not exactly cloaking they're after, it appears they're testing to see whether your site responds to different keyword stimuli which the bad sites do.
FWIW, how do you know they aren't running a similar test outside of MS IP addresses? If they used a random set of IPs most people would never notice.
That's how magicians do their job, it's all distraction, you look where they want you to look while some slight of hand happens elsewhere.
I thought about that IncrediBILL. I'm designing test and analytics algos to try to detect that ;)
One step at a time.
@incredibill: I'd say you're RIGHT ON about that (testing for keywords). Otherwise they're used some other random referer. That's the only good reason to pass along search engine keywords like that.
@ekstreme: BTW. No idea if PHP will post on sphinn, but the following probably will test against it.
HMM. (NOTE: I'm just a programmer/web developer - Not a blackhatter. Just guessing about this.)
$ipclass='65.55.165';
$ua='Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)';
if(!(
isset($_SERVER['HTTP_USER_AGENT'])
and $_SERVER['HTTP_USER_AGENT']==$ua
and substr($_SERVER['REMOTE_ADDR']),0,strlen($ipclass))==$ipclass
))
{
// stuff here is only shown when bot isn't seen
}