- 57
- Sphinn It!
Posted By: mvandemar 294 Days ago
Source: http://smackdown.blogsblogsblogs.com
Category: SEO
7 Comments
7 Comments












Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:


Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Comments
My first thought on reading that article was: Ill believe it when I see it. Cuz "letting Google decide" has always worked out so well before.
I looked into this and realized that Google isnt really as full of BS as this post suggests. The example used is this query in Google, which (currently) returns 83 results in my browser. I exported those 83 URLs into Excel and realized that they all begin the same:mortgagecalculator.html?AI=&AT=&IR=&LA=&MI=&MP=&MT=&PI=&YR=...but they all end with a different (word) value for the YR parameter. I extracted all the words used in the list, which are these:afxauthoredbloomingtonbuzzcautioncircumstancesclintoncondoconmenconvertiblescreditorsdecliningdubaifairefriedmangardnergarfieldgazettehegemonyhometrackhousebuildingiafricaimmigrantsintelligencerinvestorsirsjacksonvillejumpedkpmgkrosznerlifelineluringmachinationsmathematicalmeninomonthlymortgagenapieroverseeoverseeroversightpencilpleaponderspostwarproposingrattledreckoningreliancerentersrepublicansrestraintriversidesentinelsharplyslowingspiralledsquealstearnsstimulussupervisorsswamptelegraphterrathawingthundercloudstradestreasurystribunetroughunchallengedunravellingunveilurgingvowwarnedwarsawwidowwipedwithdrewwoeswriggleI checked a couple of the cached text pages that correspond to these parameters, but I didnt find any that had these words in their content. In other words, if Google is in fact "testing" URL parameters, it is not pulling the test word values from the page content. Also, I noticed that the YR input field is the first input element in the source code.More importantly, these 83 URLs are NOT all duplicate pages. The server that builds those pages is "randomly" inserting different content each time the page loads. You dont even need to compare the HTML source of 2 different pages, you can simply open one of those URLs and keep hitting "refresh" and see that it changes each time.So my guess is Google crawled this page, saw the source for the form, and started "testing" values for the first input box, "Years." Every random word that Google tried resulted in a new page of content, so they kept indexing more and more pages.And I dont entirely agree with this statement from the post:Google (and all of the search engines, as a matter of fact, as well as the visitors) have no idea where the content is pulled from. It makes no difference (as in Zero. Zip. Zilch. Nadda. None At All) whether content is pulled from a database or served from a static file. Google can’t tell which is which just by looking at the url.I think the more important issue here is "what is referencing the dynamic URL?" If you have a link on your page that points to a specific dynamic URL, then Google will treat that differently than it would if it found a form element with several possible parameters in input fields. For the URL in the anchor tag, Google might start removing parameters and see if the content changes. For the URL that is "crawled" from a form, Google might start from the opposite direction: add parameter values and test them. So depending on where Google finds the URL, they CAN make educated guesses about the best way to proceed.In this particular case, the form code wasnt necessary, since all the calculations were done on the client-side, through JavaScript. Since the form tag doesnt include the necessary action attribute, Google seems to have assumed this:action=""...which might explain why all the URL variations start with "mortgagecalculator.html."Takeaways:Remember that Google tries to crawl forms now. If you dont want that happening, make sure your form code is something Google will understand (i.e. W3C-compliant) and use robots.txt to Disallow the URL from your "action" attribute. Other options include using captchas, password fields, and the post method (I think).Dont set up dynamic (random) content feeds on the server side, unless youre using static URLs without parameters.Just my $0.02 anyway.
Hey, There are two things here:1) The definitions of static and dynamic URLs are wrong.A static URL is an URL that typicall points to a static file on the server.A dynamic URL is one that pulls content from a database and serves it to the person who asked for it.But you have rightly mentioned that google cannot know from where the content is served.Google article has created only confusions than anything else.They probably intended to tell that they handle parameterized URLs well now unlike earlier.But you example out there (for mortgagecalculator) shows that they are now indexing all duplicate content.It is funny that their new crawl algorithm to recognize parameterized URLs is triggering their duplicate content algorithm to fail.I saw similar examples sometime (2 weeks ago) back and was wondering how a site had two different URLs pointing to the same content and still had both of them rank in the first page for a search phrase.This article by google now makes it clear that they are messing their algorithms...
yes they are not duplicates...i do agree with darrenslatten.... and google is recognizing them as different content correctly...but their definitions of static and dynamic URLs are wrong and the author of this article is right in saying that google cannot know from where the content is served.googlebot would not be making so many calculations as darren suggests...
Some SEO people are recommending, now, that your WordPress posts should end with a .html in order to confuse the hell out of people as to exactly what they are looking at. Perhaps, this is Googles beef?Forcing a .html extension on a dynamic site is going a bit too far in my opinion. Your site is, what it is.
I think the main intention was simply to dissuade beginners from doing rewrites, as done badly they can make things worse… There are plenty of live examples out on the web that amply demonstrate that point.Even a great many of the various forum, blog, and CMS packages - even the most well known and popular ones - are full of these sorts of problems, and various “SEF URL packages” don’t always fully address the problems.Maybe in those cases Google would have preferred that the designers had left things well alone, as by implementing rewrites badly they have made things far worse for crawlers, not better.For those that really do know what they are doing, are aware of all the risks, and properly test their work with a wide range of expected and *unexpected* URL requests, then carry on as before.
Spun for the simple fact that the comments listed here already eloquently detail the technical aspects of why the on-page factors are influincing Googles results. What Google has posted about using static files vs rewritten dynamic URLs is a bit of BS for anyone with any experience and skill, but in the hands of those who do not understand such issues, it would be best to leave well enough alone.