Stopwords are common words such as “I,” “a,” “an,” “the,” and so on, that are ignored during the indexing or querying process. Removing the most common words during the indexing process keeps index sizes smaller, which enhances performance.
You can change the list of stopwords in one of two ways:
Edit the list of words in the default stopwords file located in %OmniQ_3.0%\OmniQ\config\stopwords_en.txt, or
Create a new stopwords file and configure the Text Manager to read from the new file, by editing %OmniQ_3.0%\OmniQ\config\TextModule.default.xml and changing the value of the stopwords.filename parameter to point to the new file.
The stopword list must be UTF-8 encoded. Because the
words on the stoplist are ignored when you index documents, (in
other words, the document is indexed as if the words on the stoplist
did not exist), you must make any changes to the stoplist before
you index. If you have already indexed your documents, and add new
stopwords, the words are not included in your query but the disk
space consumed by that word’s associated data is not reclaimed until
you reindex your documents.
Removing stopwords after you have already indexed your documents has no affect until you reindex your documents.
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |