Chapter 2: Configuring OmniQ Enterprise

Text Manager

The Text Manager module settings are loaded through the TextModule.default.xml configuration file. Table 2-7 shows the parameters in this file.

**Table 2-7: TextModule.default.xml parameters**
Parameter	Default	Description
min.term.length	2	The minimum term length deemed valid for indexing. This is not taken into account in the list of preserved terms and does not apply to single-digit terms.
max.term.length	20	The maximum term length deemed valid for indexing. This value must match the Term Lexicon Manager parameter term.length.max.
stopwords.filename	stopwords_en.txt	This file contains a list of stopwords to remove during the indexing and querying processes to improve system performance. See “Stopwords”.
preserved.terms.filename	preserved_terms_en.txt	This file contains the list of preserved terms that are not stemmed during indexing. The list can also include terms less than the minimum term length defined in the min.term.length parameter. See “Preserved terms”.
term.splitter.class	com.isdduk.text.Break IteratorSplitter	Specifies the Java class used for breaking text into separate words. The default BreakIteratorSplitter handles all double-byte character sets.
term.stemmer.class	com.isdduk.text. Porter2Stemmer	Specifies the Java class used for term stemming. The default Porter2Stemmer is for English text.
query.augmentor.filename	query_aug_en.txt	This file contains a list of synonyms and acronyms. See “Synonyms and acronyms (query augmentation)”.
parsers.filename	Parsers.xml	The name of the file in the config folder that contains the list of text parsers.

You can set the term splitter and stemmer classes to language-independent classes or to language-specific classes. Language-specific stemmers allow an increase in system performance when OmniQ Enterprise is going to index documents in one language only.

View this book as PDF