Published: Apr 12, 2010 - 10:38 am
Story Found By: MattMcGee 775 Days ago
Category: SEO
42 Comments
42 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Comments
So the best practices of clean code, lean extra's is back on the table for those who swept it off :)
Next up: correct markup (see: HTML5 http://radar.oreilly.com/2009/05/google-bets-big-on-html-5.html)
I think they'll still brush it off Ruud. They'll find some excuse to justify writing broken code, it's the nature of the beast.
I'm sure there will be many who'll chime in and state that lean, clean, and well formed markup does not affect document performance. Watch and see, I'm almost certain of it.
There will be many here who will need to do a complete website overhaul to take advantage of this one benefit. It's all coming back to haunt them now.
To those who have continually fought against writing well formed markup, just remember all those times I told you so. Document performance and valid markup are usually synonymous of one another.
As a web designer this is exactly what everyone is saying, a step forward. I'm commenting though because I've scanned through the post, and other than this youtube video that is a little vague, and I was hoping someone here had a few article recommendations to test site speed. (Other than validation methods)
google bot is a scraper, it doesnt care if the code is valid...
if speed is an issue its more of an IT/Server side issue IMHO, an area of the web i just dont wanna be involved in as ive got my plate full as it is lol
I wonder how those who have adamantly professed for years that well formed markup / clean code has no effect on SEO, like Jill Whalen, will respond to this?
what ?? but my blog/web "Linkdrop Removed" have big page size how to reduce it ?
it because my template or something else??
pageoneresults. Super clean code will never make up for fantastic content and links. Picking at your code seems like a waste when the time can be used towards content generation. Don't get me wrong, a page that takes 10 seconds to load is ugly, but odds are their bounce rate is already though the roof. These sites will surely feel the effect. However, just because a site loads a second or two quicker won't (a lone) make enough of a difference in the SERPS.
Atleast that's my two cents.
There is a bit of irony in the fact that @stcsurabaya is asking how to reduce big page size while the home page of his site offers training in web development and design......Ooops.
I love you man..
We've been trying to explain ROI for a long time and you have shown a shining example of where to put limited time and budget..
Let's get a few things straight here.
#1. google IMO, implemented this only as a way to get people to update to their asynchro analytics code because the MAIN lag on any website using their old one is THEIR code.
#2. Gbot is a scraper, it cares not about perfect markup. At all.
#3. Google has already stated it will affect only 1% of queries and is a minimal factor, meaning it may be a tie breaker, but that is it.
#4. Jill, myself, and many others have always been correct in saying validation was not and is not a factor as at the time it was stated, it was NOT.
#5. a document can have the absolute worst code, and still be faster than any properly coded document if the server and connection specs are better. So those of you saying to waste my time on miniscule things instead of making my clients sites the best they be for the end user looking for 'content' can jump in a lake :)
Hmmm, you seem pretty sure of this. Care to put it to the test? It must be a site under your control, one that is currently broken. It MUST contain certain markup errors and not just your everyday run of the mill basic errors.
Yes, that is what they've just stated publicly. I wonder how they handled all of this when we were building and optimizing sites for 56k and prior to?
I don't think the above statements are true, never have, never will. :)
There are always exceptions to the rule, aren't there?
I hate to say it but, I don't think you've gotten IT just yet. Give it time, it will come. Patience young man. We'll let you come play in the lake when you're ready. :)
Ever watch a crawler recovering from markup errors? I haven't but, I've seen the thousands of 500 errors generated when creating a tool that does just that, crawls documents. What a nightmare that was, years of catching all the potential markup errors that may be present. We still run into them to this day. That's why you see browsers handling things improperly, they can't fully recover from the errors. I wonder how a search engine crawler, indexer, etc. reacts in those situations?
Hey, you can keep brushing this off and performing a disservice in the process or, you can just do it right to begin with. It's a pretty simple process really. The end result? High performance documents that are free of blame. :)
You should save your theorizing for non-programmers. I have been building spiders for 15 years Edward, first in perl, then C, then php, patience grasshopper, you will get there one day. :)
But please, when you learn the difference between a crawler (spider) and an indexer, please come back with something more :)
Heh, good one, oh wise one. I could use a refresher course, that's for sure.
So, if we are talking about document performance overall, and if we were to look at the end goals of a UA (User-Agent), how would broken markup affect the outcome? How would it affect the serving of that document to the user?
As you can clearly tell, I am not a programmer! I'm an SEO who sits here and guesses what his Lead Dev is up to every day and most of the time I get it right. ;)
What theories? I have links to protocols if you wish to see them. I have backup for everything I may discuss at this level. I'm prepared. :)
Ahh but you 'are' theorizing Edward. Your theory is that googles crawler is in fact its indexer, instead of its crawler. You must be, because document performance does not come into play until the parsing and indexing stage.
As for the 'young man' bit, I think I may actually have a few years on you based on your profile picture, unless it happens to be 10 years old like mine :)
Hey, if I mixed the terminology up, please don't use that as an excuse to confuse the topic at hand. That is why I changed it to UA instead. That clears up ALL confusion.
And yes, I have a few years on you. My pic is from 2005. I'm a grizzly looking fella at the moment. Dreadlocks and stuff. Respect your elders! You know I'm kidding right? About the Respect your elders part. :)
Just for a quick tip edward, gbot only follows links and src so far, as far as I am aware. This simply means that it uses a very linear method of extraction and simply does not care much about the rest of the markup.
Which UserAgent specifically? Many of them have different end goals. Some are to crawl, some are to index, some are merely to translate the markup to user readable format. You seem to be grouping the end goal of all UA's into one singular grouping and that is not possible really. You are trying to use words which I do not think you actually understand Edward :)
I'm going to go out on a limb and disagree with you. I have my reasons. I did have a refresher course on this back in 2007 January at WebmasterWorld...
How Do Search Engine Robots Work?
I think the little threaded buggers are just a little smarter than you give them credit for. ;)
For this particular discussion, yes. If I'm incorrect in doing so, please do provide an alternative explanation that we can understand. As I've stated, I am not a programmer like you. Can you imagine my mad skills and being a programmer at the same time? I drive my Lead Dev up a wall. But, he appreciates the end results.
If I'm using words I don't fully understand, I sure hope someone like you comes along and corrects me. With your experience, maybe you can help us understand how semantic analysis is performed on documents. How does the UA determine what is a heading, what is a list, what is a link, etc. Note, I said UA, Googlebot is a UA and does follow a specific protocol for the most part, ever read that over at the W3?
Help me to understand the process, please.
Actually, not really, when it comes to spidering. When you say threaded I have to assume you mean some sort of parallelization, which is standard for any good bot coder to utilize, I have been doing so long before threading was even implemented in php. But see, you said crawler, and while a crawler is indeed a form of bot, so are many other things which actually come into play long after the crawler has done its job. Again, you are confused on either terminology or its applications.
I already have, re-read what I have posted above.
You provided the answer to your question in your own post at WMW that you so kindly linked to above a couple posts...
This whole thread so far, you have repeatedly (and mistakenly) been describing an indexer while talking about a crawler.
Maybe you should have read your WMW post before posting it here?
Well then add something to the discussion of value William. Help us to understand how the bots (aka UAs) are responding to invalid markup and how it may affect the overall equation when discussing site speed and document performance. That is what we're discussing. Did you read the linked to WebmasterWorld topic? You'll see I had my refresher course at that time. I understand the general concepts involved. And when it comes to what I'm concerned with, I know what is involved.
So, anything of value to add at this point? Typical programmer, now I need to be on my toes and make sure I'm literal. :) Don't worry, I'll provide full definitions moving forward. And, I'll be sure to not mix up the terminology and will just refer to all entities as UAs, that is what they are User-Agents. They come in the form of Googlebots, Slurps, Firefox, IE, etc. They're all interpreting the document at some level, correct? Am I correct in at least generalizing at that level WilliamC? Is that a literal enough question for you that can be answered with something we can all understand? ;)
Interpreting a page at 'some level' is not even close to being similar Edward. Please, read your WMW post that you linked to, or the important parts #2 and #3 that I quoted from that post above.
The google crawler follows links and src at this time only. Any other markup on the page has no bearing, broken or not, and causes no errors. URL normalization is simply part of the process of grepping the links and src's to follow and again does not get broken from bad markup if it was coded even remotely well. But that has nothing to do with caring about any other markup on the page. That is an indexers job.
Please try to keep up here :)
I am trying, I really want to understand this inside and out. You're the perfect candidate to pick a brain. Give me a little bit, I'm running some reports on your website, I'll have some questions pertaining to the findings in those reports. I just want to make sure we'll be discussing this topic at the same level. You the Programmer, me the SEO who knows enough to get into trouble with these types of discussions. Be back in a bit. ;)
I can save you the trouble. It does not validate, it may even have some broken code. But Google has no issues indexing all pages, and it ranks for terms such as seo forum.
Sorta shoots you in the foot does it not?
Okay, before we continue, I need to verify if I'll be speaking with the Programmer type or an SEO type? I just need to make sure that I don't bring up points that you as a Programmer wouldn't be aware of. I kind of don't want to make you look as bad as you're making me. But hey, you know I'm game for this sort of stuff. Programmer types usually get the best of me though, one of my weaknesses. But, when it comes to what is written in protocols, you can't argue with me! :)
Not really but, you surely put a smile on me face. That's usually the first thing someone coming at it from your perspective would say.
Added: I just checked, I don't see that site in the top 10 for that search term? SEOFox.com correct?
First question. Why does the www and non-www of your site return 200 OK? As an SEO, you should know that you 301 to one or the other, which is it going to be? ;)
as an SEO I know that all 3 major engines now handle the canonicalization problems between www. and non www. addresses just fine on their own, so why would I add extra time to redirect? That would go against your arguement Edward :)
Maybe if this were 2002 I would care....
Next question?
Oh, and my site is seo-shop.com, or phpproposal.com. For seofox.com you would need to speak to David Ogletree.
Shows the last time I cared enough to change my profile info here don't it :)
Well that's nice to know William. Usually folks link to their own sites or those they are responsible for from their profiles. Not to someone else's. I'm surprised Ogletree overlooked a few things over there. I really am. That is unlike him.
Usually folks try to get business from forums or blogs or things. I actually tend to just come here to just get some entertainment and share what I can. So updating my profile does not rate as a big issue, sorry. :)
Okay, so I'll ask again, am I speaking to a Programmer type or an SEO type? You think that not keeping a Social Media Profile up to date is not important? Especially one like Sphinn? I know this community has had its issues but I'll tell you what, this place is the perfect setup for the savvy SEO such as yourself.
So, did you build Ogletree's site?
Originally yes, I did build it, then sold it. He has since redesigned it completely, so that has no bearing on this conversation anyway. As for whether you are talking to an seo or a coder, what difference does that make exactly? I wear both hats, so hit me with anything you like. I would do the same, but anyone can make a single page site perfectly. :)
As I understand it, Google is defining "page speed" as the time it takes to download all the components of the page (HTML, javascript, CSS, images, Flash etc) not the rendering time and not the time to execute javascript that runs on page load.
There is a big difference between Google's interpretation of page speed and that of a user sat in front of a browser.
Hmmm, how many times must I change my profile data before this site actually remembers it. So far I have done so twice, saw the changes, come back later and see it reverted backwards. Issues guys?
Edward, as always, fun debating with you, but I do have clients and do have to spend most of my time working for a living. It has been over 30 minutes since your last post, so I will have to pick this up in the morning mate :)
WilliamC,
Though I partially agree with the notion that 100% validation is not a "requirement" for SEO, I do want to point out that you are very mistaken regarding canonicalization, and bailing out on proper 301 redirects. At the moment, www.seo-shop.com is showing 392 pages in the Google index, while seo-shop.com is showing 821 pages.
Please explain how that is not a problem, and does not reflect on the fact that Google is clearly mis-handling canonicalization on its own, which in turn means you've got some serious leakage in regard to individual page value, which thus in turn means your entire site is proportionally suffering.
And when I used the word "partially" above, what that means is it is my belief that 100% validation is not, necessarily, the most critical factor when facing the typical budgetary constraints of a site owner needing to allocate resources. There may be other, "bigger" fish to fry for most sites.
Yet clearly, if one has a site that is identical to another in all respects except broken validation, clearly best practices would dictate that it would be wise to choose to own the site that is, in fact, validated. Because that's an ideal world target is it not? Or as a programmer do you not practice clean code principles either? Just out of curiosity, I ask, because that's what you imply in your position...
Alan, while the 2 searches do indeed show differing values, look at the actual results format in either site:domain.com or site:www.domain.com and tel me how google sees all of them please. :)
As far as html validation mattering, when even the browsers can not get together on standards, I will make a deal with ya... when google.com validates, I will start thinking maybe it matters. :)
Alan, remember we are only talking about validation in rankings, not usability or 'best practices' here. So let's keep our comments along those lines please. I do agree that for usability sake, in many cases, clean markup should matter.
As far as clean code principles, they actually matter in real code, markup tagging, not so much, as even the browsers can't get together on it, let alone engines like google.
Where did Edward go last night, I was hoping to wakeup to some activity or even ruckas. :)
I'm here William. I needed some rest so that I could effectively communicate with you. I'll be back later most likely to opine further on Site Speed and the UA. Please do note that I'm using the term User-Agent from this point forward!
By the way, any other sites you're responsible for that I'm monitoring here? Website Validation Showdown
This has been quite Entertaining to watch! I for one, Giggle at Edward for all his validation efforts. Kudos buddy, we all have our angle.
But for the record, old school guys who still dabble on both sides of the fence...and delve into the Shades of Gray...know there is some Art and tactics that WILL NOT VALIDATE that personally...are techniques that still work for some of us.
I used to teach it under the title: "Alternative HTML" for search. A handful of 1st Gen SEO's will remember that. That threw validation out the window, while we made every effort to still render what was necerrary to the end user for their experience WHILE coding what we knew would help the client campaign...on a purely relevant level.
However...spend the time...Validate away! I'm with William on this one. Hands down.
Thanks Daron, and yes, I remember those experiments quite well from almost a decade ago.
I've not seen one implementation yet that can't be validated, not one. It doesn't matter what color label you smack on it.
That's not true. What about markup that involves RDFa and/or Microformats? What about HTML5 and the heavy shift to document semantics?
To keep things simple, my personal opinion is that any sites serving up markup errors are taking many unknown and known risks. If you say that those errors have no bearing on document speed then I guess I'll have to finally give in. ;)
Edward, I will agree with RDFa, and partially about html5. However when you said:
You shot yourself in the foot, because our discussion relates to crawlers, and which would not be bothered in the least by those errors IMO. Read andy murds comment up above. If the discussion were about browsers, then I might agree with you at least partially, but it was not.
William, at some point, you WILL provide some information of value to this topic. Who conjured you up anyway? Would you say this is an accurate description of a "Crawler" in general?
Next time you reply, instead of taking pot shots, how about providing some useful information for those following along. I don't mind being corrected in this instance and taking shots for the community at large as long as you start providing some information of value. :)
Ahh but Edward, I already have, in comment: #76247
It was YOU that first tried taking potshots, with mistaken information I might add, and from that point onwards, I have but merely corrected your every attempted point. I have done nothing but reply to your non-points and correct them. :)