// you’re reading...

google

Google Crawls The Deep Web

Your bookshelf is not complete without these books! Check out the Absolutely Must Have Books List

In their official blog, Google announces that they are experimenting with technologies to index the Deep Web, i.e. the sites hidden behind forms, in order to be ‘the gateway to large volumes of data beyond the normal scope of search engines’.
Google Logo
For that purpose, the engine tries to automatically get past the forms: ‘For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML’. Nevertheless, directions like ‘nofollow’ and ‘noindex’ are still respected, so sites can still be excluded from this type of search.

Among the possible wins for Google users is the ability to find pages within sites based on searches of those site. As the Google Webmaster blog explains:

For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made
Google Logo
The results of those crawls would then show up in your Google search results, potentially offering a faster, more direct way to reach the information you’re searching for.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Propeller
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati