Published: Aug 16, 2007 - 11:27 am
Story Found By: skinner 1640 Days ago
Category: Vertical Search
22 Comments
22 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Comments
Sphunn, reddited, stumbled and delicioused. This is pretty important stuff.
Yes I agree. There should be a hall of infamy somewhere where we put the details of people who do this stuff.
Its actually come up in a few places, Im publishing because its not some isolated problem that only happens to a few underlinked MFA sites.
Nice job with the tips on how to prevent/fix it at the end of the article.
Thanks, Michael. So far, knock on wood, the reverse cloaking has held up for several months with every site.
This has been going on for years, as the article says, but to allude that only "a few people" knew about it "and kept quiet for the good of the community" is plain false I think. It has been discussed in several public forums many times over the last few years. There is much prior information out there; but this article does a very good job of putting all the facts in one place in an orderly manner.
wow
Wonderful article Dan. Im looking forward to see what comes of this.
Great article. I have a question though. The work around is to basically validate that the search engine bots are in fact who they say they are via IPs, and if not include a noindex for proxies. Correct? What is to prevent the proxies from just stripping out the noindex from pages? Especially if they are setup for malicious purposes. I can see it working to prevent the typical proxy, but it seems to me that the major issue is when these are used in the form an attack on a websites position in the search engines.
Simple, this is addressed in comments on the post as well... but the "advantage" of this exploit for black hats is that its hands-off. They arent creating their own proxies, they are exploiting someone elses proxy server(s), and it appears to be a numbers game - you need to use a bunch of proxies to get the job done. To deploy even one (much less hundreds or thousands) of proxy servers that strip meta tags would mean that you need to get them hosted somewhere. Which makes it a lot less "hands off." You could get caught. A guy like Brad Fallon could send lawyers (or worse) after you.
Thanks Dan, I guess I didnt make it all the way through the comments.
Cant say I blame you - a few of us are writing mini-novels over there!
I am not sure what to make of http://www.bradfallon.com/linkrequest.html and I do wonder if any of that is part of his downfall...
Yeah... thats why he got hacked all right.
Wish I could say I did all the behind the scenes stuff, Dan -- but really, Ive mainly added to the other voices that have expressed concerns over domain hijacking and the need to understand whats an "original" site.
If this has truly been ignored or even back-burnered at Google, Yahoo, etc. ... shame on them. Great write-up, Dan -- you made it easy to understand.
Danny, thanks for clearing that up. I guess you creating a platform (the Bot Obedience panels) was good enough. :D
Yup, this is all old news as we covered this topic at Bot Obedience panels in SES San Jose 06 and SES Chicago 06 and even in PubCon 06 with Googles Vanessa Fox sitting right there. Google did make good on their promise to provide a way to accurately detect Googlebot so we could stop the spoofing when the Googlebot user agent is passed through the proxy but if its filtered out youre still in trouble. Ive personally not seen hard evidence of that happening until Dans article and Im not sure the user agent was filtered our or they cached copies and then let Googlebot crawl their cache, but the net result is the same. If Google would just properly attribute content to the rightful owner, where they found it in the first place, and not give the second instance encountered ownership, then the proxy issue and full content scraping forcing your content into duplicate penalties would no longer be a problem. The problem is how do you identify its truly your content in such a way that someone else stealing your content also couldnt duplicate that ID. The simplest suggestion I have has always been that Google would have to add a push/pull mechanism to where you register the page and they instantly pull it, just like the AdSense mediabot does currently. Since the first person to PUSH the content wins ownership your blog would have to wait until Google confirmed receipt of the new content before you could publish it. Guess what Im advocating is kind of like a real-time interactive version of Google sitemaps which would solve this once and for all.
Great article Dan, a little bit late though, anyways you have the credit... Hopefully many other SEO gurus will start revealing their secrets, ha!
not an easy decision but thanks for thinking it through so thoroughly.
Great article. Thank you for submitting it.
Wow. Its great to have this information, and...wow. Thanks to all involved.