Home
Three Parts Of A Crawler-Based Search Engine
By Danny Sullivan, Editor (October 14, 2002)

 
Crawler-based search engines
have three major elements.
1. The spider
2. The index
3. Search engine software

1. The spider
The spider, also called the crawler, 
  • visits a web page,
  • reads it,
  • follows links to other pages within the site.
  • returns to the site on a regular basis to look for changes.
  • everything the spider finds goes into the second part of the search engine [top]
  • 
    2. The index
    
  • The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds.
  • If a web page changes, then this book is updated with new information.
  • a web page may have been "spidered" but not yet "indexed." (takes time)
  • unindexed pages will not be available for searching with the search engine. [top]
  • 
    3. Search engine software
    This is the program that sifts through the 
    millions of pages recorded in the index to 
    find matches to a search and rank them in 
    order of what it believes is most relevant. 
    
    You can learn more about how search engine 
    software ranks web pages.
    
    


    Top | Home