Chapter 6: Developing and configuring custom files  Developing and configuring custom parsers

Chapter 6: Developing and configuring custom files

Developing and configuring custom filters

OmniQ Enterprise uses a third-party solution, Stellent, for parsing many document formats. The Stellent document filter is a multi-filter—in other words, the same filter instance handles all supported MIME types. Thus, the Stellent filter is configured to handle the MIME type */*, indicating that it can filter text from documents of any MIME type presented to it.

When OmniQ Enterprise obtains a filter for a document, it first identifies its MIME type from the file extension. For example, C:\document.pdf has the MIME type “application” and the subtype “pdf” (application/pdf). OmniQ Enterprise then requests a filter from the Filter Factory to handle documents with the identified MIME type.

The filter lookup is performed in this order:

  1. If a filter is configured to handle a specific MIME type, that filter instance is returned.

  2. If a multi-filter (*/*) is configured, that filter instance is returned.

  3. No filter is returned, denoting “not indexable.”

You can add additional filters by editing the XML configuration file located in %OMNIQ_3.0%\OmniQ\config\FilterFactory.default.xml. See “Modules” for information about the FilterFactory.default.xml file.





Copyright © 2005. Sybase Inc. All rights reserved. Developing and configuring custom parsers

View this book as PDF