How Search Engines Work with the Hidden Web?
It is perhaps important to note that general or standard search engines willnot find material in the Invisible Web, and it cannot be found for two reasons.First, as powerful as many search engines are, they have technology limitationsthat prevent them from getting to the Invisible Web.
Second, it is extremelyexpensive to develop and maintain comprehensive, general purpose searchengines. Those search engines look at literally billions of Web sites and attemptto organize them into some manageable whole. It is just not financially practical, therefore, for a comprehensive general purpose search engine to delve intothe Invisible Web even if it could (Sherman and Price, 2007, p. xiii).
Let’s now reexamine the types of files that are typically not found by standard search engines and are therefore a part of the Invisible Web. In general, search engines do not “play well” with material that is not text-based. Webpages that are primarily video, audio, or images—in other words non-textbased material—are rarely accessible through the standard search engine. Thereare some specific files formats within these general file types that searchengines cannot handle.
- PDF or Postscript formats unless they come from Google.
- Flash. Of course, Apple and iPads also have issues with this file format.
- Programs. Actually all executable files.
- Compressed files such as .zip files.
The difficulty with indexing these types of files is that they are not HTMLtext and standard search engines generally do not choose to index them, mostly for financial reasons.
In the larger picture, these types of files make up a small percentage of thematerial found in the Invisible Web. The much larger amount of material foundon the Invisible Web is one of two types: (1) single Web pages or (2) databaseinformation. Single Web pages are generally Web pages created by individualusers. The information contained on these Web pages is sometimes valuable,but they cannot be located by standard search engines because there are no linksfor the Web crawlers that are at the heart of search engines to locate the page.The second large category of material located on the Invisible Web is database information which can generally be further divided into three categories.The first is database material that is designed for the needs of individualusers. This data is often generated by forms and is contained in relational databases.
Standard Web search engines cannot fill out the required information ininteractive forms and therefore, even if you have an exact URL of the search,it will not return the data. The second type of data found in databases in theInvisible Web is streaming or real-time content. Because there is so much of itand it changes so rapidly, standard Web search engines just cannot keep up withthis content. The third type is dynamically generated content. This is similar tothe first item discussed relative to database information on the Invisible Web.
This post was created with our nice and easy submission form. Create your post!