|
Search Engine Spiders
A search engine spider, also known as a web crawler or web spider, browses the World Wide Web in a methodical, automated manner. Search engine spiders are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches. A search engine spider is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, recursively browsing the Web according to a set of policies. (Source: Wikipedia) Search engine crawlers work best with a wholly HTML-composed site. The less clutter in your page code, the easier it is for a spider to harvest the available information. As web design turns more and more to dynamic pages like Cold Fusion, Active Server Pages, and Javascript, you may discover that search spiders either have difficulty in gathering the data from your sites, or might not visit at all. However, it is possible to maintain good SEO on a dynamic site if you know how to circumvent any problems. According to Web Developer's Journal: A dynamic Web page is a template that displays specific information in response to queries. Most of the page content comes from the database connected to the Web site. Visitors love them since they get quick access to the information they want. These sites are easy for webmasters to update: as product offerings or prices change, just edit your database instead of hundreds of individual Web pages. Search engine spiders have a much tougher time with dynamic sites. Some get stuck because they can't supply the information the site needs to generate the page. Other spiders deliberately stay away from dynamic pages to avoid getting trapped in the site. It is highly recommended, then, to format dynamic URLs into simple HTML style URLs for search. The SearchTools.com site provides a tutorial on formatting dynamic URLs for search. The robots.txt File A robots.txt file is good to have on your site's server, as a search engine spider will find it and know which URLs on your site to crawl and which to ignore. Search Engine World's tutorial on robots.txt files explains how to create and implement an effective file for spiders to find, so no time is wasted and no outdated information is cached. It is helpful, too, to keep such a file uploaded so only fresh content is cached. This can help your site's relevance in search Useful Search Engine Spider Links Current search engine robots
|
|
|||||||||||||||||||||||||||||||||||||||||||