Metadata parsers are used to process metadata values, which are received as strings. Although document body text is processed by the system text tokenizer and stemmer, metadata must often be handled differently, because metadata string values can be numeric and date type.
There are four types of metadata parsers:
String – supports TEXT type metadata fields
Numeric decimal – supports FLOAT type metadata fields
Numeric integer – supports INT type metadata fields
Date (time) – supports FLOAT type metadata fields
Sybase Search includes these preconfigured metadata parsers – each requires an identifier that consists of two parts, a name and an unique ID.
Item |
Description |
|
---|---|---|
Name |
float_1 |
|
Class |
com.isdduk.text.SimpleFloatParser This class parses strings representing decimal numbers into actual decimal numbers. For example, this parser processes the string “3.142” into Java float 3.142. |
|
Name |
integer_2 |
|
Class |
com.isdduk.text.IntegerParser This class parses strings representing an integer number into an actual integer number; any floating-point information is discarded. For example, this parser processes both “3” and “3.142” into Java int 3. |
|
Name |
dateUK_3 |
|
Class |
com.isdduk.text.DateFormatParser |
|
Name |
dateMs1970_4 |
|
Class |
com.isdduk.text.Ms1970DateParser |
|
Parameter |
roundTo Value – choose a year, month, day, hour, minute, second, or any other value to indicate that no rounding should take place. This class is a date parser, which parses strings representing long integer (64-bit) numbers, which themselves represent dates as the number of milliseconds since 1 January 1970. The preconfigured instance rounds dates to the nearest day (Coordinated Universal Time). |
|
Name |
intB2KB_5 |
|
Class |
com.isdduk.text.B2KBIntParser This class parses strings representing byte-size numbers and converts them into kilobyte-sized numbers. For example, the string “2048” (bytes) is parsed as Java int 2 (kilobytes). |
|
Name |
datePDF_6 |
|
Class |
com.isdduk.text.PDFDateParser |
|
Parameter |
roundTo Value – choose a year, month, day, hour, minute, second, or any other value to denote that no rounding should take place. This class handles the PDF date format, in which dates are formatted “D:20030602143803+01'00'”. The preconfigured instance rounds dates to the nearest day (UTC). |
|
Name |
url_7 |
|
Class |
com.isdduk.text.URLTermParser This class splits URL strings into their constituent elements, namely, protocol, host, port, path, extension, and query. Optionally, each element can be indexed separately. The options parameter determines the elements that the parser returns. By default not all elements are not indexed. For example, the values for protocol and port elements, http and 80, respectively, are usually the same for all URLs and hence are not indexed by default. |
|
Parameter |
options Value – choose the value that is the sum of the bits that represent the elements the URL parser should return:
For example, for the parser to return the path and extension URL elements, set the options parameter to 24 (8+16). If you then use this parameter to parse, for example, http://www.mysite.com/about/jobs.html, the parser returns “about”, “jobs”, and “html.” |
|
Name |
int2int |
|
Class |
com.isdduk.text.Int2IntParser This class parses strings representing integer numbers and factors the integer value using operators. |
|
Parameter |
|
Adding
new metadata parsers
You can create new metadata parsers. The system generates a unique integer ID for each new elements that form part of the parser identifier.
Click Configuration.
Click Metadata Parsers.
Click Add a new metadata parser.
Complete these fields:
Name |
Description |
---|---|
Parser Name |
Name of the parser instance. |
Implement Class |
Java implementation class. |
If your metadata parser requires special parameters, click Add, else proceed to step 6. Complete these fields.
Name |
Description |
---|---|
Name |
The name of the parameter to pass to the parser. |
Value |
The string value to associate with the parameter name. |
Click Create.
Editing
metadata parsers
You can edit metadata parsers only if it is not being used anywhere including in both query parsers and metadata fields that references the metadata parser.
From the Metadata Configuration Summary page, click Metadata Parsers.
Click Edit for the parser that you want to change.
Make the changes and click Save Changes.
Removing
metadata parsers
You can remove metadata parsers only if it is not being used anywhere including in both query parsers and metadata fields that references the metadata parser.
From the Metadata Configuration Summary page, click Metadata Parsers.
Click Remove for the parser that you want to delete.
Click OK to confirm the removal.