Sphinn
What's Sphinn?  |  Join us!  |  Login  | Help | Tools

  • Hot Topics
  • What's New
  • Greatest Hits
  • Comments
  • Submit
  • Calendar
  • Network
  • Feeds
  • Search Engine Land
  • All Categories
  • Google
  • Yahoo
  • Microsoft
  • Search Marketing
  • Social Media
  • Online Marketing
  • Searching
  • Other
  •      :: Sphinn Live ::

Sphinn Home » AOL
  • 37
  • Sphinn It!
Oh my God Dmoz is Gone!!!!
Avatar Went Hot: September 24, 2007 - 11:48 pm
Posted By: Eavesy 231 days ago
Topic Type: News Story (Jump to http://www.google.co.uk) my network
Category: AOL
A search for Dmoz now brings up PhilC's article in the no.1 position. Things have been going down hill since they set up the 301 to the www. Has Google really had enough with directories? Or have Aol seriously messed something up?
59 Comments     

  • Comments
  • Who Sphunn This (37)
  • Who Desphunn This (0)

Comments

from qwerty 231 days ago #
Votes: 3 | Vote:
+ -

I get 1.37 million results for site:dmoz.org.

from Feydakin 231 days ago #
Votes: 2 | Vote:
+ -

Algo tweak? Mistake? On purpose?

 The best we can hope for an end to the "I can't get in to DMOZ" threads.. Nah, that would be asking too much.. 


from AndyBeard 231 days ago #
Votes: 2 | Vote:
+ -

I get 2920 results for site:JohnChow.com

Maybe they forgot to use the noODP metatag ;)


from g1smd 231 days ago #
Votes: 0 | Vote:
+ -

... and the reason that it doesn't rank is that the ODP added a set of site-wide 301 redirects from dmoz.(org¦com) and from www.dmoz.com all pointing to www.dmoz.org just a few weeks ago. I expect things to be in turmoil until Google completely recalculates PR and backlinks.

 

 

I already see some interesting effects, and some patterns, here and there.  I guess this will take several, to many, months to right itself: http://www.webmasterworld.com/google/3437548.htm


from Eavesy 231 days ago #
Votes: 0 | Vote:
+ -

G1smd, you do not know what you are talking about, the 301 from the dmoz homepage was recognized weeks ago, there is no reason why this would cause the homepage to get deindexed, something is not right. BTW I never said they were banned, at least not in this post, my blog post was hyped up for digg and I never submitted it here.

from g1smd 231 days ago #
Votes: 0 | Vote:
+ -

*** you do not know what you are talking about ***

Hate to disagree.

You obviously did not try the searches posted above, or you failed to understand them.

The post at "Digg" includes the word "ban", and is wrong.


from Eavesy 231 days ago #
Votes: 0 | Vote:
+ -

Didn't I just say the blog post that was not submitted here by myself was hyped up for digg? All of the facts that I have presented in this post are correct.

from g1smd 231 days ago #
Votes: 0 | Vote:
+ -

Still, please check the search strings posted above and all will become very clear.

 

*** Didn't I just say the blog post that was not submitted here by myself was hyped up for digg? ***

Ah, that's the other Sphinn thread.... http://sphinn.com/story/6606


from Eavesy 231 days ago #
Votes: 0 | Vote:
+ -

I have checked your URLs, all I have said in this thread is that the homepage does not rank number 1. The blog post was digg bait, that despite being more popular then any other post this weekend got buried.

from DLPerry 230 days ago #
Votes: 0 | Vote:
+ -

Totally off-topic, and I haven't looked at the Digg post (I hate Digg) but......

Forgive the stupidity on my part, but why on Earth would you post a fabrication on Digg?  Is that the 'New Thing' - posting falsehoods on Social Sites?  I can see if it was a 'storytelling' post - but if it's about a real, true event' - why the lies?


from Lyndon 230 days ago #
Votes: 1 | Vote:
+ -

"why the lies?"

DLPerry, for exactly the same reason why every media organisation lies. If you don't get that, you shouldn't be in marketing. Or even waste your time at a site that deals with marketing.

You didn't think the "Boy eats own head" headline was true did you? ;) 


from Eavesy 230 days ago #
Votes: 0 | Vote:
+ -

Even if I delete the blog post, I have had over 3500 uniques since last night according to AWStats, good for my Alexa! Also lie is a harsh word, I would say it was more of an exageration.

from DLPerry 230 days ago #
Votes: 0 | Vote:
+ -

I guess I was hoping for something a bit more 'positive'.  Oh well - it seems the familiar old 'because everyone else does it' and 'because that's the way it is' mentality is still the way to go.

I must've missed the "Boy eats own head" headline - sounds like a 'tabloid' piece - which I personally would never believe  - mainly because of the sensationalist tactics they employ, and the  'Liar Liar Pants on Fire' reputation they have earned.

And you know - I do keep seeing posts complaining about how SEO and online marketing have been getting bad a reputation.  Maybe this lie tactic has something to do with that? 

just my old and idealistic .02  :)

 

 

 


from dannysullivan 230 days ago #
Votes: 1 | Vote:
+ -

Cross linking a few items. Also on Sphinn here:
http://sphinn.com/story/6606
http://sphinn.com/story/6644


From what I can see, the home page is NOT in Google. Nor is Dmoz ranking for things it totally should be ranking for:

http://www.google.com/search?q=open%20directory%20project
http://www.google.com/search?hl=en&q=odp

Like some others, I wondered if it had been hit by the Great Directory Ban Of Sept. 2007?
http://searchengineland.com/070920-085657.php
http://sphinn.com/story/4415
http://www.seomoz.org/blog/what-makes-a-good-web-directory-and-why-google-penalized-dozens-of-bad-ones

Nope, internal pages aren't missing. But the home page is -- and why seems a mystery to me. I don't see Dmoz itself having any blocks on it. Other thoughts?

 


from jdevalk 230 days ago #
Votes: 0 | Vote:
+ -

I think it's just duped out... This one:

http://core-n02.dmoz.aol.com:30080/ 

is in the top 10 for "dmoz". 


from DMOZ 230 days ago #
Votes: 0 | Vote:
+ -

Not sure why every single page *but* the home page is indexed.

We're looking into it.


from betweenstations 230 days ago #
Votes: 0 | Vote:
+ -

We can only hope that this is the start of a downward spiral.

from qwerty 230 days ago #
Votes: 0 | Vote:
+ -

A boy ate his own head??

I really think it's just a dupe content issue brought up by all the redirects they've set up. Further complicating the issue is the fact that while all of their internal home page links are pointing to http://dmoz.org, they're all being redirected to http://www.dmoz.org. They should update those links ASAP. You don't use a 301 to fix a link when you can just change the link without causing any harm, especially when it's on every page of your domain.


from Fridaynite 230 days ago #
Votes: 0 | Vote:
+ -

Why should Google rank the ODP?
Google Directory is the same and DMOZ ist duplicate content ;-)

 

 


from justin 230 days ago #
Votes: 2 | Vote:
+ -

The sad thing is that so many people feel that dmoz is legit and think there's a lot of value in being listed in it.  It's been corrupt for a while now, it just needs to go away.

from g1smd 230 days ago # - show/hide this comment
Votes: -1 | Vote:
+ -

It is a very simple domain canonicalisation problem, just like the many hundreds that I have written about many times before.

It shouldn't take you very long to discover which URL the Root Page, and all the Top Level categories, are indexed under.

I'll also guess that it won't take Google's indexing system much more than a month to realise what is going on and fix the problem.

Heck, in one of ther posts above, I even listed some Google searches that totally show the reason for the problem.

Maybe I give people far too much credit in being able to understand site: searches and what they show you.


from MattCutts 229 days ago #
Votes: 6 | Vote:
+ -

Hey all, I dug into this a little bit with the help of a couple crawl folks. It looks like when Googlebot tried to fetch http://www.dmoz.org/, we got a 301 redirect back to http://www.dmoz.org/ . It looks like that self-loop has been going on for several days. We were last able to fetch the root page successfully on Sept. 10th, but from that point on DMOZ was returning these 301-to-itself pages, and after a few days Googlebot gave up on trying to fetch the url.

 

It looks like the rest of the site is fine, so I suspect that if DMOZ gets 301/redirects for their root page sorted out on their webserver, we'll recrawl and index the page pretty quickly.

 

DLPerry, keep the faith. If you read back over the comments, several people (g1smd, jdevalk) suggested reasonable explanations instead of going right to "ZOMG! Google hatez da Moz!?!" :)


from qwerty 229 days ago #
Votes: 0 | Vote:
+ -

Ah, a voice of reason :)

from JohnWeb 229 days ago #
Votes: 1 | Vote:
+ -

Odd, they aren't redirecting currently, it shows a 200, then again I'm not Googlebot, was only Googlebot getting the redirect?

 http://oyoy.eu/page/headers/?full=1&url=http://www.dmoz.org/


from MattCutts 229 days ago #
Votes: 1 | Vote:
+ -

JohnWeb, just from a cursory glance (i.e. take this with a grain of salt), it looked like we might have been able to fetch a valid page earlier today, so they might have already made a change. It's always possible that they were doing something specific for Google's IP range, but my hope is that folks on the DMOZ side have figured it out themselves and this will sort itself out without too much extra trouble.

from JohnWeb 229 days ago #
Votes: 0 | Vote:
+ -

That makes sense, 16 hours ago DMOZ (if that's his/her real name) said they were looking into it. I've seen dozens of sites in GWHG with their homepage removed and would never have thought of "Googlebot gave up" as the reason.  Live and learn I guess.

from Halfdeck 229 days ago #
Votes: 0 | Vote:
+ -

I was working with a client whose home page poofed. In his case, he had one url 301 redirecting to the home page. He requested that redirecting URL to be removed. A few days later, the removed URL was showing 11K backlinks in Webmaster tools, as if it was the home page. And then the home page was nowhere to be found in the SERPs. So I told him to remove the redirect and have the url issue a 404 instead, and then I told him to request a URL inclusion. Few days after that, his home page came back. Can't really say what fixed the problem though, but that was the first time I saw the home page just disappear from the SERPs.

from JohnWeb 229 days ago #
Votes: 0 | Vote:
+ -

Halfdeck, what is an "URL inclusion" request?  A reconsideration request? or a URL submission?

from iBrian 229 days ago #
Votes: 0 | Vote:
+ -

I think this is a pretty good example of why people need to scream "manual penalty!" second, and explore all other reasonable options first. :)

from dannysullivan 229 days ago #
Votes: 1 | Vote:
+ -

Thanks, Matt. Wow -- lesson learned, never 301 back to the same page. Not that I would.

And remember -- never type Google into Google or you'll break the internet:
http://googlified.com/2007if-you-type-google-into-google/

 


from Sem-Advance 229 days ago # - show/hide this comment
Votes: -1 | Vote:
+ -

g1smd

 Another great find again. Canonical URLs & Google ....so many miss this fix straight off they spend years wondering why Google doesn't give them the love they think they should get.

 Keep up the good work and remember Matt C works for Google.com, so not all he posts is 100% factual.

 Peace!


from g1smd 229 days ago #
Votes: 0 | Vote:
+ -

There is a Sphinn bug if you go back and edit a post. I already reported that one over at http://sphinn.com/story/6814#c9643

from SamIWas 229 days ago #
Votes: 0 | Vote:
+ -

So essentially IF Matt's post on what Googlebot encountered was not Googlebot getting it totally wrong, then DMOZ has to have been cloaking it to serve it only to Google (and possibly other search engines). Otherwise tons more people would have noticed the redirecting issue because it effectively closes down that page - I've done it before when testing :)

DMOZ/AOL, please weigh in on this. It'd be interesting to find out if googlebot just royally stuffed up a 301 or if it really was an incorrectly implemented cloaked 301. 


from SamIWas 229 days ago #
Votes: 0 | Vote:
+ -

"What would happen if  the Webmaster Tools perferences were set to "non-www" and the site then had the non-www to www redirect implemented a few months later, without changing the Webmaster Tools setting?"

 
Interesting question. I don't think cloaking 301's like this is standard practice so it'll be interesting to find out if this was indeed a googlebot bug or not.... 


from oldschool 229 days ago #
Votes: 0 | Vote:
+ -

So who is control of 301's at DMOZ, the Meta's? (sorry I couldn't resist).  The 301 loop sounds about on-par for how DMOZ is run in general.  I continue to stand amazed that no one touches AOL's unwanted step child, as she stands broken and abused by the "system".

For those that speculated that Google is anti-DMOZ, it's far from the truth, and in fact, on the contrary, they are "in bed" with DMOZ... and that's the only divorce I will openly support.

Matt's on this thread, so perhaps he can shed some light as to why Google would partner with DMOZ amidst all of the negative PR in the industry, and the fact that it is WAY outdated because no one is really working to keep things in order. I am just amazed that they would consider it a quality resource given that over the past few years, thousands of site have not been added, or even worse, removed (not speculation, first hand info here - I am a former editor).


from MattCutts 229 days ago #
Votes: 3 | Vote:
+ -

JohnWeb, just to be 100% clear, "Googlebot gave up" is not the root reason. I was just introducing a bit of levity. The real reason was of course the infinite redirect loop that lasted for days. If I 301 page A to point back to page A and do that infinite loop for a week (or more), it's probably a bad user experience to return that infinite loop to users. But if the loop stops, then our system is set up to get the page again fairly quickly.

 

iBrian, well-said. Danny, it's pretty rare to see a site do an infinite redirect loop like that, but it does happen. g1smd, I'm pretty sure that I was looking at www.dmoz.org, not dmoz.com, but I was just doing a quick/lightweight check, so I won't claim to be 100% positive.


from g1smd 229 days ago #
Votes: 1 | Vote:
+ -

Matt: I see no evidence of any sort of loop being created for www.dmoz.org at any time in the last few months, so I am quite purplexed. I have been keeping a careful eye out for redirect and canonicalisation issues as the changes have been made, as you might imagine. I do see Google gobbling up alternative URLs for one of the load-balancing servers for the last couple of months though.


from JohnWeb 229 days ago #
Votes: 0 | Vote:
+ -

MattCutts, thanks once again for your clarification on my miscommunication.

from MattCutts 228 days ago #
Votes: 1 | Vote:
+ -

g1smd, I only did a cursory dig and that's what it looked like at that point. I've been asking about it more, and it looks like dmoz's 301 might have interacted badly with a heuristic on Google's side. I'm still keeping an eye on it and I'll bug the crawl team until everything looks good.

from JohnWeb 228 days ago #
Votes: 0 | Vote:
+ -

Out of curiosity, I've managed to create a page that will return a 301 but not redirect so that the browser will show the content of the page.  http://www.jlh-design.com/2007/09/googlebot-gave-up/#comment-5301  Could DMOZ screw up their code this much? I'd imagine it's possible. 

 Curious enogh though a browser shows the page content (appearing normal to a user) the online header checker I used makes an "assumption" that the page should redirect to itself.

This may not have been the exact mechanism for the DMOZ page dropping out of Google, but at least I can replicate it somewhat. 

Coming up next week on Myth Busters, Jamie and JohnWeb... 


from g1smd 228 days ago #
Votes: 0 | Vote:
+ -

Thanks Matt.  I'll ping the AOL server techs with your comments.

I do now see stuff like this reappearing in a site:www.dmoz.org search:

www.dmoz.org/Arts/ - 14 hours ago - Similar pages


from DMOZ 227 days ago # - show/hide this comment
Votes: -1 | Vote:
+ -

All is right with the world again.

Full story here: http://blog.dmoz.org/2007/09/26/the-search-for-dmoz/


from JohnWeb 227 days ago #
Votes: 0 | Vote:
+ -

"part of an index recognizing, adjusting and updating in real time"  You really believe the people here are going to fall for that? 

 

 


from Halfdeck 227 days ago #
Votes: 0 | Vote:
+ -

JLH, by "URL inclusion" I mean going into Webmaster Tools, going to the "Removed URLs" console and clicking on "reinclude" or something like that. I haven't tried it myself so I'm not sure what the UI is called exactly.

"part of an index recognizing, adjusting and updating in real time" You really believe the people here are going to fall for that?

What's amusing to me is the blog post is tagged "Truth" :) 


from JohnWeb 227 days ago #
Votes: 0 | Vote:
+ -

Halfdeck, Okay, thanks for the answer, I knew I was missing something.

from g1smd 227 days ago #
Votes: 0 | Vote:
+ -

Well, about 80 000 pages from www.dmoz.org reappeared overnight (UK time) in Google's index, starting about the time that Matt Cutts posted here... so how much more realtime do you wanna get with this stuff?


from Vincent 227 days ago #
Votes: 0 | Vote:
+ -

Matt, I think the issue here is bigger than what initially appears.

An incredible number of pages which URLs have been changed to http://www.dmoz.org/... from http://dmoz.org/... during their process of canonicalisation, are currently not in Google cache and their links are not being considered by Google. As an example:

A google search for "Academy of Canadian Cinema and Television" site:www.dmoz.com, brings no results, although these words are clearly on the page: http://www.dmoz.org/Regional/North_America/Canada/Arts_and_Entertainment/

So at the moment Google does not see this link from DMOZ, together with millions of other links in other pages on DMOZ.

This is affecting millions of websites and of course an incredible amount of Google search results, until the canonicalised http://www.dmoz.org/... pages are indexed and cached in Google.

I think that the engineers at Google should have a proper look into this, trying to index all the DMOZ pages with new URLs as soon as possible. In fact a huge number of searches and even tests at Google on new algorithms might be altered by this effect.


from g1smd 146 days ago #
Votes: 0 | Vote:
+ -

Too funny to see the amount of so-called SEOs that don't understand simple domain canonicalisation problems and mistake it for a ban.

 

This is the real reason:

http://www.google.com/search?num=100&q=site%3Anewhoo.com+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.newhoo.com

http://www.google.com/search?num=100&q=site%3Admoz.com+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.dmoz.com

http://www.google.com/search?num=100&q=site%3Acore-n02.dmoz.aol.com+-inurl:chefmoz

http://www.google.com/search?num=100&q=site%3Admoz.org+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.dmoz.org

 

Beware of rogue SPACES that Sphinn has inserted into some of those search URLs.


from g1smd 146 days ago #
Votes: 0 | Vote:
+ -

I am not sure how an infinite loop could possibly have happened. I have looked at the canonical (www) root page several times per day in the last few weeks, and it did not ever redirect for me. There were redirects set up for (www.)newhoo.com and for (www.)dmoz.com and for dmoz.org. All of those pointed to www.dmoz.org/ . No-one would have been able to access the Root Index page at all had it been redirecting. Users would have been simply presented with a "Redirection Limit Exceeded" error message from their browser. Such a redirect would have been noticed long ago. There have been no such reports.

Matt. Are you sure you didn't misread the logs and were actually looking at requests for www.dmoz.com/ being redirected www.dmoz.org/ perhaps?



As you know, the ODP made some (hardware) infrastructure changes almost a month ago, and as a part of that, a non- canonical URL was accidentally exposed for indexing. I gave some very big clues in the search strings above. One of them returns the root page and 105 000 categories. The Root and the whole of the directory Top Levels are now all fully indexed under that alternative domain. The URL is that of one of the load-balancing servers.



See:http://www.google.com/search?num=100&q=site%3Acore-n02.dmoz.aol.com+-inurl:chefmoz

and that is an error. Google has been picking up those URLs since at least the beginning of August.

AOL server techs are well aware of the issues and are working on various fixes. It just goes to show that when doing a large amount of work, server upgrades, and implementing various load-balancing changes, as well as starting to sort out various domain canonicalisation issues, that something can easily go wrong if you do things in the wrong order or miss out a step. One issue that is already being addressed is that some internal links (mostly on informational pages) are hard-coded to point to dmoz.org (non-www) URLs and those are now all being edited to point to the www version instead. That will be ongoing for many weeks.


from Harith 146 days ago #
Votes: 0 | Vote:
+ -

g1smd latest post looks strange in my Firefox. I have tried to gather it.

g1smd wrote:

 

I am not sure how an infinite loop could possibly have happened. I have looked at the canonical (www) root page several times per day in the last few weeks, and it did not ever redirect for me.  There were redirects set up for (www.)newhoo.com and for (www.)dmoz.com and for dmoz.org. All of those pointed to www.dmoz.org/ . No-one would have been able to access the Root Index page at all had it been redirecting. Users would have been simply presented with a "Redirection Limit Exceeded"  error message from their browser. Such a redirect would have been noticed long ago. There have been no such reports. Matt. Are you sure you didn't misread the logs and were actually looking at requests for www.dmoz.com/  being redirected www.dmoz.org/ perhaps?
As you know, the ODP made some (hardware) infrastructure changes almost a month ago, and as a part of that, a non- canonical URL was accidentally exposed for indexing. I gave some very big clues in the search strings above.  One of them returns the root page and 105 000 categories.   The  Root and the whole of the directory Top Levels are now all fully indexed under that alternative domain.  The URL is that of one of the  load-balancing servers.  See:    http://www.google.com/search?num=100&q=site%3Acore-n02.dmoz.aol.com+-inurl:chefmoz and that is an error.   Google has been picking up those URLs since at least the beginning of August.  

AOL server techs are well aware of the issues and are working on various fixes.  It just goes to show that when doing a large amount of work,   server upgrades, and implementing various load-balancing changes, as well as starting to sort out various domain canonicalisation issues, that something can easily go wrong if you do things in the wrong order or miss out a step.   One issue that is already being addressed is that some internal links (mostly on informational pages)  are hard-coded to point to dmoz.org (non-www) URLs and those are now all being edited to point to the www version instead.  That will be ongoing for many weeks.


from g1smd 146 days ago #
Votes: 0 | Vote:
+ -

What would happen if the Webmaster Tools perferences were set to "non-www" and the site then had the non-www to www redirect implemented a few months later, without changing the Webmaster Tools setting?

Whatever was going on with (www.)?(dmoz|newhoo).(com|org) this search is the key http://www.google.com/search?num=100&q=site%3Acore-n02.dmoz.aol.com+-inurl:chefmoz to unravelling the overall effect.


from g1smd 146 days ago #
Votes: 0 | Vote:
+ -

I would give them a couple of weeks or more to discover everything.

At one page per second, they can spider 86 400 pages per day.

The ODP has at about 20 times that amount of pages (also counting category descriptions, guidelines, FAQ pages, profiles, etc).

See also: http://www.google.com/search?num=100&q=site%3Acore-n02.dmoz.aol.com+-inurl:chefmoz


from g1smd 146 days ago #
Votes: 0 | Vote:
+ -

The ODP content was previously available through more than 30 different domains and direct IP addresses. These had been hosted at various times by Netscape, Mozilla, and AOL.

A few months ago, along with some necessary hardware changes and upgrades, everything was reconfigured so that just www.dmoz.org became the canonical domain.

At first, there were a few glitches showing in the listings within Google SERPs. Several domains were missed in the canonicalisation fixes, and were rapidly indexed in preference to www.dmoz.org by Google.

Once those holes were plugged, Google began to slowly re-index the other non-canonical versions of the directory. Some of the URLs dropped into the Supplemental Index, but most of them were de-indexed.

After just a few months, there are just a few hundred incorrect URLs showing up. Most of the problem URLs have now been completely de-indexed.

The main listings for www.dmoz.org show almost one million URLs indexed in Google when using the site:www.dmoz.org search.

The job is now just about complete.

from g1smd 139 days ago #
Votes: 0 | Vote:
+ -

Some of the comments above are now displayed in the wrong order after being edited to remove some formatting issues.

The correct order can be deduced from the post number (behind the # link on each post) rather than from the post date.

from g1smd 139 days ago #
Votes: 0 | Vote:
+ -

Everything is now back on track.


See that the Duplicate Content has fallen to almost zero URLs indexed:
http://www.google.com/search?num=100&q=site%3Anewhoo.com+-inurl:www
http://www.google.com/search?num=100&q=site%3Awww.newhoo.com
http://www.google.com/search?num=100&q=site%3Admoz.com+-inurl:www
http://www.google.com/search?num=100&q=site%3Awww.dmoz.com
http://www.google.com/search?num=100&q=site%3Acore.dmoz.aol.com
http://www.google.com/search?num=100&q=site%3Adirectory.mozilla.org
http://www.google.com/search?num=100&q=site%3A207.200.81.183
http://www.google.com/search?num=100&q=site%3A207.200.81.184


The Canonical Domain now has almost a million pages indexed:
http://www.google.com/search?num=100&q=site%3Awww.dmoz.org


Some Supplemental Results can hang around for a very long time:
http://www.google.com/search?num=100&q=site%3A207.200.81.154


from g1smd 139 days ago #
Votes: 0 | Vote:
+ -


The  &  bug mashes the URLs, and stops them working.

Remove the  amp;  bit from the URL to get it to work.

from g1smd 128 days ago #
Votes: 0 | Vote:
+ -

Everything is now back on track for ODP site re-indexing.



See that the Duplicate Content has fallen to almost zero URLs indexed:


http://www.google.com/search?num=100&q=site%3Anewhoo.com+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.newhoo.com

http://www.google.com/search?num=100&q=site%3Anewhoo.org+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.newhoo.org

http://www.google.com/search?num=100&q=site%3Admoz.com+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.dmoz.com

http://www.google.com/search?num=100&q=site%3Acore.dmoz.aol.com

http://www.google.com/search?num=100&q=site%3Adirectory.mozilla.org

http://www.google.com/search?num=100&q=site%3Agnuhoo.com+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.gnuhoo.com

http://www.google.com/search?num=100&q=site%3Agnuhoo.org+-inurl:www

http://www.google.com/search?num=100&q=site%3Awww.gnuhoo.org

http://www.google.com/search?num=100&q=site%3A207.200.81.135

http://www.google.com/search?num=100&q=site%3A207.200.81.139

http://www.google.com/search?num=100&q=site%3A207.200.81.140

http://www.google.com/search?num=100&q=site%3A207.200.81.175

http://www.google.com/search?num=100&q=site%3A207.200.81.183

http://www.google.com/search?num=100&q=site%3A207.200.81.184

http://www.google.com/search?num=100&q=site%3A207.126.111.202

http://www.google.com/search?num=100&q=site%3A207.126.111.231




The Canonical Domain now has almost a million pages indexed:


http://www.google.com/search?num=100&q=site%3Awww.dmoz.org




Some Supplemental Results can hang around for a very long time:


http://www.google.com/search?num=100&q=site%3A207.200.81.154


That IP address has been out of use for a long time.




Including the direct IP address accesses, and various sub-domain and load-balancer URLs, there used to be ~34 ways to get to ODP content as hosted by Netscape/AOL servers. Now there is only one way.


from g1smd 128 days ago #
Votes: 0 | Vote:
+ -

The  &  bug mashes the URLs, and stops them working.

Remove the  amp;  bit from the URL to get it to work.

from g1smd 116 days ago #
Votes: 0 | Vote:
+ -

The last few hundred listed URLs have now become the last few dozen to still show.



I am guessing that the problem will be completely fixed in the next few weeks.


Log in to comment or register here.

Who Sphunn This Topic?

  1. Avatar Eavesy
  2. Avatar DoshDosh
  3. Avatar AndyBeard
  4. Avatar Harith
  5. Avatar Feydakin
  6. Avatar ViperChill
  7. Avatar aimClear
  8. Avatar johnty
  9. Avatar HamletBatista
  10. Avatar AndrewGirdwood
  11. Avatar Lyndon
  12. Avatar Sebastian
  13. Avatar dannysullivan
  14. Avatar DaveDavis
  15. Avatar TheNanny612
  16. Avatar betweenstations
  17. Avatar patrickaltoft
  18. Avatar evilgreenmonkey
  19. Avatar MattCutts
  20. Avatar JohnWeb
  21. Avatar NickWilsdon
  22. Avatar iBrian
  23. Avatar onreact
  24. Avatar KDye
  25. Avatar queenbeecassi
  26. Avatar Gamermk
  27. Avatar oldschool