Difference between revisions of "Web spider"

From BC$ MobileTV Wiki
Jump to: navigation, search
Line 7: Line 7:
  
  
== Resources ==
 
  
* Mini Bots PHP Class: http://www.barattalo.it/mini-bots-php-class/
+
 
 +
 
 +
== Tools ==
 +
 
 +
* Apache Nutch: https://nutch.apache.org/ (open source large-scale web crawler with distributed computing support)
 +
 
  
 
=== CommonCrawl ===
 
=== CommonCrawl ===
Line 24: Line 28:
  
 
* '''Sphider''' - PHP Search Engine Spider: http://www.sphider.eu/
 
* '''Sphider''' - PHP Search Engine Spider: http://www.sphider.eu/
 
  
  
 +
== Resources ==
 +
 +
* Mini Bots PHP Class: http://www.barattalo.it/mini-bots-php-class/
 +
  
 
== External Links ==
 
== External Links ==
Line 33: Line 40:
 
* My Web Spider: http://php4fun.blogspot.com/2007/11/my-web-spider.html
 
* My Web Spider: http://php4fun.blogspot.com/2007/11/my-web-spider.html
 
* Finding What People Want -- Experiences with the WebCrawler (PhD thesis that produced WebCrawler.com): http://www.thinkpink.com/bp/WebCrawler/WWW94.html
 
* Finding What People Want -- Experiences with the WebCrawler (PhD thesis that produced WebCrawler.com): http://www.thinkpink.com/bp/WebCrawler/WWW94.html
 
 
  
 
== See Also ==
 
== See Also ==
  
 
[[Scraper]]
 
[[Scraper]]

Revision as of 18:00, 26 August 2015

A Web Spider (also commonly referred to as a Web Crawler) is a program which runs indefinitely and visits links on the web for the purpose of generating information about the underlying link-structure of the web. [1]





Tools


CommonCrawl

CommonCrawl is an attempt to create an open and accessible crawl of the web for education, research and other non-commercial innovation.

Sphider

Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is a great tool for adding search functionality to your web site or building your custom search engine. Sphider is small, easy to set up and modify, and is used in thousands of websites across the world.


Resources


External Links

See Also

Scraper
  1. wikipedia:Web spider