The text tokenizer parameters are loaded through the TextModule.default.xml configuration file in the hub container. Table 3-9 shows the configurable attributes for the TextProcessor tag in this file.
Parameter |
Default |
Description |
---|---|---|
TextTokenizer |
com.isdduk.text.parsing.StdTextTokenizer |
Defines the class name of the text tokenizer being used in the whole system. |
TermStemmer |
com.isdduk.text.Porter2Stemmer |
Defines the class name of the term stemmer being used in the whole system. |
Param |
None |
Defines the parameters being used by the local text tokenizer. |
Satellite containers use the same class defined in the TextModule.default.xml configuration file, so you need not define the class name of the text tokenizer.