Story Found By: JohnWeb 2120 Days ago
Category: Vertical Search
Is it just for attention? Is it so Google closes the door on it? Note that the site is indexed, 2 hours ago minty fresh type indexing.
From the site:
GProxy hacking tool
There is an exploit in Googles algorithm that has been known for over a year, which is still very much active, whereas you can remove URLs from Googles search results. Google thus far has chosen to do nothing about it, so why not automate the process?
13 Comments




Comments
Ok, test with a couple Google properties launched.
Sebastian, Im thinking of testing it with this site http://sebastians-pamphlets.com/ :-)
Please dont hit my youngest baby, I would blast your twits then ;) Seriously, if this thingy works as described Google has to ignore a few more proxies. They said theyll react when proxy indexing influences search results. Lets see.
We all think alike...
If it worked ... it would mean that automated blog comment spam would work. When was the last time that automated comment spam beat you in the serps?
With all respect John, your analogy isnt suitable. Comment spam is dealt with, proxy indexing OTOH is a completely other story, especially when the proxied content gets feeded with enough PageRank. Then when the proxy is controlled by the invader I can think of a lot of nasty things. I dont say that Google cant invent a counter strategy, but judging from their official statements Ive seen so far they havent yet.
Just from looking at the tool on the outside, I think it does the following: - access several proxies with a fake user-agent to get them to cache the page (assuming they need that and dont grab it on the fly) - push links to those "proxified URLs" to blogs, either as automated comment spam or as postings on specially crafted splogs (blog + ping), in a way that Google recognizes the links and follows them. In order for the proxies to take your site out of the serps (assuming it works the way they say), they would have to rank above your site, have more value than your real site. I can see how that might be an issue with zero-value sites, but any site that is already indexed will surely have more value than a URL that is fed only with comment or blog-spam? (or am I missing a vital element?)
The vital element is that theres a black corner in everybodys soul. Sure, this thingy here uses splogs and such, hence it just produces duplication on a level Google does ignore by design caused by the lack of trusted votes. The problem is not this tool, its the exploit itself. We all know how to enhance the concept, that means people will do it. In fact people are doing it already with success whilst Google just states "please alert us when you spot a case where we index a proxy *AND* it has serious impact on a sites rankings". Ive no clue what Google has been cooking during the year they knew about it, but Im a little disappointed because blasting competition applying this method still seems to work.
How would Google determine the original content owner?
badass tool!
@ John Mu Date of inception is stored in Google database and why this whole proxy removal tool probably does not work. As for such a hole in Google I doubt it exists. If you look at sitemaps they require verification that you own the site. Hence if someone were to try to remove a URL I would think there would be a check & balance in place to stop this type of malicious threat. For example URL submitted for removal Website crawled, document still available = no removal Website crawled, document removed = removal A very easy and algorithmically capable method, to thwart such attempts. Google Engineers are not stupid...
John, once the proxy is detected, a lookup in Googles archives for older issues and their link profiles could easily lead to the author/content owner. It would be even possible to identify the whole artificial construction to deindex it.
Good idea, Sebastian. It could be a bit problematic though, imagine someone accidentally sets up a proxy like that next to his normal site (or someone does it "for" him). On the other hand, thats like many other things - youre responsible for your own site, period. @Sem-Advance the problem with the "Date of inception" is that people could go around and take over new or low-value (partially unindexed) sites by re-publishing their content on a higher-value domain. Thats the same problem that plagues most other technical methods to recognize the original owner: if the creator if the copy is technically up-to-date, they could do whatever is necessary to register the content before the original owner has a chance to do that (or even knows to do it).