- 19
- Sphinn It!
Topic Type: News Story (Jump to http://ventureskills.wordpress.com)
Category: SEO
11 Comments
11 Comments
Save the date for:
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.
SMX West - Feb. 10-12
Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.
Comments
This one is pretty cool:
http://forum.vbulletinsetup.com/anything
Like the little animation, shame they haven't thought through the rest of the page though.
I think they got one thing right you need to make the person forget that they have found an error, now they just need to make it more of an opportunity
The best 404 page is the 404 page you don't need to track because 404 traffic is managed properly before it hits the 404 page. A generic layer between the error handler and the 404 page can do the trick.
@Eavesy - that is so funny!
@ Sebastian
thats great for internal structure but how do you cope with external sites miss linking to you?
@tnash
Actually, I've developed that to avoid wasting of external link juice. Meanwhile I do all kind of things with that 404 management layer I've inserted between the .htaccess ErrorDocument directive and the actual error page.
It 301 redirects 404 errors to known canonical URLs, it can guess canonical URLs from expressions in the errornous URL and/or the referrer to 302 redirect to the best match until I assign a persistent (301) mapping, it knows that it has to 404 Google's probe files, it sends bogus requests to never land or to a popup hell of my choice, it shows hotlinkers how they can kiss my ass, it imports Google's 404 stats or external mapping tables to 301 redirect chunks of outdated URLs to their new locations, and what not. That works for broken internal links, external links, type-in traffic, and can do special things with crawlers. It can even get configured to handle on-the-fly URLs (for particular page areas) ... since I've that tool and it's somewhat trained, managing 404s is sheer fun.
Sebastian
I like your style. Our industry - at least the daily blogging, social whips of industry - lack the technical knowledge to truly appreciate your comment.
Most are still scratching their heads wondering where the first part of the filename is on .htaccess {wink}
Personally, while I mostly understand your comment, I am confused about you mapping process that seemingly "learns" from the CGI page request url.
Please elaborate if you should see this...
Thanks
@surftrip
There are different mappings. Persistent mappings are just assignments of canonical URLs to gathered 404s. For example
"/inDEX.htm" "/index,html" "/Index.php" ... captured by the error handler from invalid links or type-in traffic, or imported from external sources like Google's 404 reports, could all point to the canonical URL "/". The 404grabber layer sits between the error handler and the 404 page, looks at each and every 404, and when a user agent requests "/index,html" it does a 301 redirect to "/". If a faulty request of "//INDEX.HTM" is not mapped to "/" this request gets recorded, and the 404grabber executes the error page (if there is no other way to find or guess the canonical URL). I get a list of unmapped 404s, assign "//INDEX.HTM" to "/", and from now on all these requests get 301 redirected to "/".
Next there are soft mappings. That is a list of canonical URLs with 1...n keywords/URL-parts/expressions... assigned. That's used as second try. Say "/moms-coffee-sucks.htm" has a keyword list of "mom" "-coffee-sucks" "coffee sucks" "tea is better". When a hit from a SERP like google.com/search?q=moms+tea+is+better targeting "/momscoffeesucks" 404s, it gets 302 redirected by "tea is better" to "/moms-coffee-sucks.htm" and logged. The 404 grabber looks for keywords in different header lines. When I find that this request deserves a persistent redirect, I click on that item and from now on "/momscoffeesucks" gets 301-redirected to "/moms-coffee-sucks.htm".
There are more methods to train the 404 grabber, and with every refinement the thingy works better. In the beginning I spent an hour or so on the definitions and algo tweaks daily, now that's a 5 minutes task. Filtering out bogus requests and hotlinking helps a lot, but these filters need "training" too. I was quite astonished how many faulty requests the script routed to the right page after a few days of "learning".
Also, reports like 404s by referrer or invalid crawler requests allow me to track down invalid external links. If a faulty link sends human traffic, or that page is nicely ranked, I write a polite letter to the Webmaster and ask for a correction. If that's ignored, the 301 transfers link love and human traffic to the right URL. I try to waste nothing.
I guess that many visitors hit the back button when they see a 404 page, regardless how nice or polite or funny it is, but stay longer when they get redirected to a landing page best matching their request. Actually, the visitors won't spot the redirect, and crawlers can get handled slightly different, if necessary. The goal is to capture 99% of all 404 traffic without showing the error page, and I think that's doable.
BTW ... related link: http://sphinn.com/story/597
Nice idea I have updated the post with a nod to your sign up, though using 301's is really not appropriate unless your sure your sending them correctly. Have you thought about using 307's then flagging for inspection, once its been checked it then is moved into the whitelist (i.e turned to 301)
Thanks for the link :)
I do 302 redirects for "guessed" destinations and 301 redirects for approved mappings. I don't trust search engines with 307 ;) The 302s make sure that crawlers keep an eye on the not existing URLs, and these soft redirects are reported so that the site admin can assign a permanent redirect, fix errors or whatnot.