This section provides information for Java developers about developing, configuring, and using custom text splitters.
All document body text and textual metadata (excluding file paths) values are passed through the configured term splitter to be broken into individual terms. Each term that is not preserved (see “Preserved terms”), not a stopword (see “Stopwords”), and is neither too short nor too long, is passed to the configured term stemmer to be reduced to its root form. Both the term splitter and term stemmer can be reimplemented and reconfigured where necessary.
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |