LinkChecker [for MMBase]

Websites contain links, it in the nature of them. Links over time may become invalid, that is also in there nature. People who create links don't want to check manually if the links become invalid. This leads to the conclusion that there is a market for programs that can check links. The Programs do not only checks if a link is valid they also first need to acquire a list of links to check and they need to give feedback to the user.

This is what a link checker does

  • get a list of links to check
  • check the links (if the link checker is a crawler the content of the links if parsed for other links)
  • generate a report
  • give feedback to the user or program

Most link checkers found on the web are crawlers. Because a crawler gets its information directly from the web site , the report only shows on what page invalid links where found. For dynamic websites crawlers pose at least 2 problems. First of all links found on the website are stored in the database, if a user wants to fix the invalid link he should not edit the page where the broken link was found but the link in the database. Because the crawler doesn't get this information(where the link is stored in de database) it is unable to tell the user where to fix the invalid link (garbage in garbage out). Secondly link checker crawls every page on the website, this gives unnecessary load on your website's server, that is if you are lucky and that all you pages are related to each other via links, and that the link checker find links hidden in JavaScript , and your website is not flash-based and etc? . If you are not using a lame content management system internal links should be guarantied by the system. And all that need checking are external links

LinkChecker is not a crawler. It acquire a list of URL's with meta-data from a configured URL. The meta-data typically is information about the way to edit the URL [in MMBase terms this is the object number]

LinkChecker then processes the URL's in the list one by one. For each link LinkChecker checks the validity of the link by trying to access the URL. The result of the validity check consist of, a list of encountered errors, information about the time spent checking the link and information about the probability that the links was invalid. All this raw information is stored in 1 one file.

LinkChecker works with plugins to implement functionalilty

install

  • download the linkchecker
  • unpack the file (tar jxvf ~/linkchecker-0.95.tar.bz)
  • go into the directory
  • do a test run ./linkchecker.sh config/mmbase.org.config
  • start customizing