setup

The Linkchecker is configured via a property file that is given at the command line while starting. The configuration is split in 2 parts one "global configuration" located in net/sf/mmapps/modules/linkchecker/global.properties and a local configuration given at de commad line (example: config/localhost.config). At startup first the global configuration is read and the local configuration is added to the global configuration. This means that it's possible to override the global configuration. Unless specified all properties are required. This documentaion is kept sync by hand. also look at the property names defined in the java source core

reactor configuration

those properties define what plugins to use in the reactor,

propertydescription
plugin.namesa , separated list of plugins to be performed for each plugin a property plugin.[name].class must be defined, plugins must implement ReactorPlugin. The pluins are first registered and after they are processed in the order they aprear here. This is why it's handy toi use the fetchlinks plugin as the first in the list
plugin.[name].classthe class to witch [name] must point

fetchlinks plugin

The FetchLinksFromHttpPlugins aquires a list of urls(this is currently a required plugin)

propertydescription
linkchecker.urllist.urlSet the source url from witch to get a list of urls

maxlimit plugin

The MaxLimitPlugin is an optional plugin that limits the amount of links that are checked

propertydescription
linkchecker.maxlimitif more then maxlimit links are found the remaining links are removed

internal reference plugin

The InternalReferencePlugin plugin is a plugin that adds comments to the link when it matches the linkchecker.internalreference.host property and adds a comment that internal urls should not be handled by urls but must be handled by the CMS

propertydescription
linkchecker.internalreference.hostthe url that is to match with the host we are currentely checking links for

xmlfilter plugin

The XmlFilterPlugin plugin is a plugin that is configured via xml and is able to add comments or skip checking links based or matching on string literals or regular expressions.

propertydescription
linkchecker.xmlfilter.configfilemust point to the xmlfilter.xml file where the filter is configured

http linkcheck plugin

The HTTPLinkCheckerPlugin plugin checks the validity of http links by sending a head or get request to the recieved link

propertydescription
linkchecker.userAgentSet the user agent used when checking links
linkchecker.connectionTimeoutMillisSets the socket timeout in milliseconds (SO_TIMEOUT) when chekking links
linkchecker.httpConnectionFactoryTimeoutMillisSets the timeout in milliseconds used when retrieving an HTTP connection from the HTTP connection manager
linkchecker.numberOfThreadsSets the number of concurently checking threads
linkchecker.httpstatus.[HTTP_STATUS].badnessBased on recieved http status code "bad" points are added [HTTP_STATUS] must be replaced with a status code
linkchecker.HttpRecoverableException.badnessSets the badness for when a recoverable exception occurs
linkchecker.UnknownHostException.badnessSets the badness for when a UnknownHostException is thrown
linkchecker.ConnectException.badnessSets the badness for when a ConnectException is thrown
linkchecker.IOException.badnessSets the badness for when a IOException occurs that is not one of the ohter exeption mentioned. if thsi happens it might be a good idea to let a programmer look at what happend
linkchecker.RuntimeException.badnessIdem as IOException but worse. this might me something that should be mail to the administrator and not the end user
linkchecker.useHeadMethodThe best way to check a link is to "really" do a get request but often only doing a HEAD is enough. this saves a lot of trafic(and possibely out of memory errors)

report plugin

The ReportPlugin plugin creates and mails a report to the configured email address using an xsl stylesheet

propertydescription
linkchecker.userAgentSet the user agent used when checking links
linkchecker.email.smtp.hostSet the host user for sending mail (via the SMTP protocol)
linkchecker.email.from.addressSet the email address used as "from" when sending an email(example: linkchecker@mycompany.com)
linkchecker.email.from.nameSet the email name used as "from" when sending an email(example: Joop Foei
linkchecker.email.bcclistSet the email BCC used as "bcc" when sending an email(example: nospam@users.sf.net, multiple address kan be added (;) separated
linkchecker.report.xslto xsl to use to transform the xml report to html
linkchecker.report.titletitle of the linkcheck report (used in the email subject)

logging

Logging can be configured by editing net/sf/mmapps/modules/linkchecker/log4j.properties. This is a standard log4j configuration (manual). the two last rows are the httpclient configuration.

Here is a sample configuration:(also look at the current log4j.properties)

log4j.rootLogger=WARN, A1
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-9p %c{2} - %m%n
log4j.logger.net.sf=INFO
log4j.logger.httpclient=OFF
log4j.logger.org.apache=OFF
      

Input file format

The input format is quite simple first there is the url to check and secondly the meta-data. For mmbase the meta data is a link to the editors.

URL|META_DATA
URL2|META_DATA_2