Setting text tokenizer parameters

The text tokenizer parameters are loaded through the TextModule.default.xml configuration file in the hub container. Table 3-9 shows the configurable attributes for the TextProcessor tag in this file.

Table 3-9: Text tokenizer parameters

Parameter

Default

Description

TextTokenizer

com.isdduk.text.parsing.StdTextTokenizer

Defines the class name of the text tokenizer being used in the whole system.

TermStemmer

com.isdduk.text.Porter2Stemmer

Defines the class name of the term stemmer being used in the whole system.

Param

None

Defines the parameters being used by the local text tokenizer.

Satellite containers use the same class defined in the TextModule.default.xml configuration file, so you need not define the class name of the text tokenizer.