Parsers are used for processing metadata values, which are generally received as string key/value pairs. While document body text is processed by the system term splitter and stemmer, metadata often must be handled differently (as metadata values can be not only strings, but also numeric and date types). The parsers loaded by the Text Manager are referenced in the metadata field parser and query parser XML configuration files.
There are four different types of parsers:
String
Numeric decimal
Numeric integer
Date (time)
A string parser is always handled by internal classes. You can build custom numeric and dates parsers and plug them into the system if necessary. Table 2-18 shows the attributes for the Parser tag.
Attribute |
Default |
Description |
---|---|---|
identifier |
None |
The Parser instance’s identifier. This must be a name and a unique ID separated by an underscore (_). |
class |
None |
The Java implementation class. |
Table 2-19 shows the attributes for the Param tag.
Attribute |
Default |
Description |
---|---|---|
name |
None |
The name of the parameter to pass to the parser. |
value |
None |
The string value to associate with the parameter name. |
OmniQ Enterprise comes with the preconfigured parsers, shown in Table 2-20, which are adequate for most common metadata types.
Name |
float_1 |
|
Class |
com.isdduk.text.SimpleFloatParser This class parses strings representing decimal numbers into actual decimal numbers. For instance, the string “3.142” is parsed into Java float 3.142. |
|
Name |
integer_2 |
|
Class |
com.isdduk.text.IntegerParser This class parses strings representing an integer number into an actual integer number; any floating-point information is discarded. For instance, both “3” and “3.142” are parsed into Java int 3. |
|
Name |
dateUK_3 |
|
Class |
com.isdduk.text.DateFormatParser |
|
Name |
dateMs1970_4 |
|
Class |
com.isdduk.text.Ms1970DateParser |
|
Parameter |
Name – roundTo. Value – choose a year, month, day, hour, minute, second, or any other value to denote no rounding should take place. This class is date parser, which effectively parses strings representing long integer (64-bit) numbers, which themselves represent dates as the number of milliseconds since 1 January 1970. The preconfigured instance rounds dates to the nearest day (UTC). |
|
Name |
intB2KB_5 |
|
Class |
com.isdduk.text.B2KBIntParser This class parses strings representing byte-size numbers and converts them into kilobyte-size numbers. For instance, the string “2048” (bytes) is parsed as Java int 2 (kilobytes). |
|
Name |
datePDF_6 |
|
Class |
com.isdduk.text.PDFDateParser |
|
Parameter |
Name – roundTo. Value – choose a year, month, day, hour, minute, second, or any other value to denote no rounding should take place. This class handles the PDF date format, in which dates are formatted “D:20030602143803+01'00'”. The preconfigured instance rounds dates to the nearest day (UTC). |
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |