Published: Apr 11, 2008 - 08:48 pm
Story Found By: jeffquipp 1868 Days ago
Category: SEO
13 Comments
13 Comments
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
Join us at an upcoming SMX event:


Learn more about search marketing with our free online webcasts and webinars from our sister site, Digital Marketing Depot. Upcoming online events include:
Comments
Wow - this has lots of implications. Looks like everyone should take a good look at their robots.txt and make sure Google doesnt go crazy indexing your site.
Mark, Googles doing this in a limited fashion. You can do a site:seoroi.com or site:brianchappell.com search for examples. Heres the official Google quote:"We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site."If youd rather not have it at all, I believe that Mike vanDeMar (smackdown.blogsblogsblogs.com) will be sharing his code as to how to block this dynamically - its a pretty smart solution. Also, my apologies to Google for falsely accusing them of leaking Google Analytics data into the index.
Also, heres Matts announcement on the issue: http://sphinn.com/story/40122
Seriously this is scary. From an SEO point of view it is unnecessary - if the site owner wants stuff indexed they should make it available. From all other points of view I see this as a real problem.1. Marketing uses forms all the time to capture lead information. And they measure it. Lets say Google tries to fill out the form and get to the inside information once a day for a month. That is 30 hits on the form that were completed but did not convert. Kind of messes with the data. 2. No matter what they say would you, as a security/risk manager, be satisfied that Google isnt going to try and guess the login or password just because the button or form element has something familiar on it like password. A lot of personal information (aka healthcare) and business information (aka intranet access) lie behind forms. 3. What if the form is about gathering demograhpic information. I am not sure that the demographics of a search engine or the demographics of some bot with keywords that are randomly chosen is information an advertiser is trying to capture.So now we get GoogleBot instead of Donald Duck as the false name on our forms.Bad News.
Its a very good improvemnet.
Way too many qualifiers there to count on it happening and hope for useful crawls of important pages, rather than converting to CSS dropdowns to replace the forms fields you want crawled. Here are some excerpts from the Google Blog post:... when we encounter a element on a high-quality site, we might choose to do a small number of queries using the form...... If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index ...... Only a small number of particularly useful sites receive this treatment, and our crawl agent ...... limit ourselves to a very small number of fetches for a given site...Seriously, with all of those qualifiers, it almost sounds as though it would affect only a handful of sites at best.
Im curious is this means we should eventually look at optimizing our forms. Not to get carried away and turn our forms into spam. But perhaps to write "Im interested in your SEO work" instead of just writing, "Im interested in your services."
On the face of it as a search engine user I might appreciate it.. but it sort of seems unnecessary .. it is not innovative in the larger sense of the word..
itravin... what, you mean like innovative..?
Sounds like this explains how Googles been indexing Wordpress search results.
its just another autobot so if you dont want to accept auto-submitted forms add a form field and hide it from users via CSS. If it comes in completed, trust its a bot and should not be accepted (or better, should be answered with the apprpriate response, whatever your SEO brain decides that response should be).
One of the websites where Google does this is Lyrics.net, owned by one of my coworkers. I have published some findings on my blog:http://www.lunchpauze.com/2008/04/googlebot-wtf-are-you-doing.html
iBrian, I think thats the explanation for several sites, yes.