The term stemmer interface is much simpler than its splitter counterpart. It defines only three methods:
com.isdduk.text.TermStemmer stem(com.isdduk.text.Term term) : com.isdduk.text.Term hasNormalize() : boolean normalize(com.isdduk.text.Term term) : com.isdduk.text.Term
The stem method takes a term argument and returns a stemmed version of it, which is in many cases the same object, although perhaps with a different length. The normalize method caters for terms that are not sent through the stem method (which should incorporate normalization as part of its routine)—it ensures the term conforms to a single standard of representation (for example, a German stemmer may normalize the sharp S “ß” to its equivalent “ss” or vice versa). Terms may bypass the stem method occasionally, when their lengths exceed the maximum allowed (and are therefore “force stemmed” to fit).
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |