Indexing document stores

Indexing involves collecting data about documents owned by a document store and storing their proprietary data structures, generically called indexes. Once the documents in a document store have been indexed, the documents are available for searching.

Data for all documents are collected during the first indexing session; subsequent indexing sessions collect data for new documents, modified documents, and deleted documents. Thus, the amount of data collected during two indexing sessions can vary dramatically.

When you create a document store, you can select to immediately index. After you have created a document store you can go to the Document Store Information page to perform these type of indexing:

Incremental Index – re-run the indexing process over the saved document store configuration. All new documents are indexed; all updated documents are reindexed; all deleted documents are removed from the indexes.
Part Index – define specific documents to add to a document store. There are two types of part indexes:
- File System Part Index – index only those documents that exist in any of the document store’s document roots. If a document is already indexed, Sybase Search checks for changes and reindexes, if necessary. If the document parameter is in the Sybase Search indexes but no longer exists on the file system, it is deleted from the index. The part index process does not check directory trees for new, modified, or deleted documents, which can save a significant amount of time for large document stores.
  
  Documents that are not available in a valid root directory are ignored.
- Database Part Index – the original SQL statement is re-run to find new, updated, and deleted rows. For large databases, this can be time consuming. However, you can tailor the Database Part Index to fetch only the new and updated rows, or only the document references of the rows that should be removed. It can also accept a delimited list of document references to remove.
The part index processes are primarily for use within OEM applications.

All data collected during an indexing session is stored in a data buffer. The data buffer is a RAM-oriented data structure, where data is aggregated, ready to be written to an index stripe. This buffer is flushed when the maximum memory threshold (specified in the system property omniq.index.buffer.maxMemory) has been exceeded. The buffer shares this memory allocation with the document store’s active index stripe. See “Striping index data”.