Use the TXT filter to parse plain text files. Plain text files seldom contain any information about how they are encoded, so the text filter is often forced to use a default decoder when the code is not known.
Configure the default decoder by defining a Java system property com.filter.txt.defaultCharset with the character set name of the decoder to use. This is most easily achieved by creating a new XML SystemProperty tag in the relevant container configuration file.
If you do not define this property, the filter uses the code identified by the standard Java system property file.encoding. Table 3-16 shows a list of parameter settings used by the text filter.
Parameter |
Value |
---|---|
className |
com.omniq.filter.txt.PlainTextFilter |
extractorClassName |
com.omniq.filter.StandardExtractor |
mimeTypes |
text/html |
timeout |
N/A |
keepTempFiles |
N/A |