The Filter Factory settings are loaded through the FilterFactory.default.xml configuration file.
The list of default filters in the configuration file are:
Default HTML filter
Default EML filter
SearchML export multi-filter
SearchML filter
Each filter specifies a number of settings, which determine which class is loaded for the filter, which paragraph extractor is used, and the MIME types to which the filter applies. Table 2-12 shows the filter setting parameters.
Parameter |
Default |
Description |
---|---|---|
className |
None |
The Java class that defines the filter. |
extractorClassName |
None |
The Java class used for extracting paragraphs from the filtered text |
mimeTypes |
None |
The list of MIME types that are associated with the filter. |
timeout |
45,000 |
Indicates the time in milliseconds the filter waits while filtering a document. If the filter exceeds the given time, the filter aborts. This parameter is used mainly by the Stellent filter. |
keepTempFiles |
false |
If set to true, the filter keeps any temporary files produced during the filtering process. This is used mainly by the Stellent filter. |
In addition to the filter-specific settings, there are a number of general filter settings that help the extractors determine the paragraphs. The filter ensures that each paragraph is between the minimum and maximum lengths and aims for the ideal paragraph length.
Table 2-13 shows the paragraph length settings.
Parameter |
Default |
Description |
---|---|---|
default.minParaLen |
250 |
The minimum number of characters in a paragraph |
default.idealParaLen |
500 |
The ideal number of characters in a paragraph |
default.maxParaLen |
1,000 |
The maximum number of characters in a paragraph |
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |