Indexing document stores

Indexing is the process of collecting data about documents contained in a document store and storing its proprietary data structures, generically called indexes. After documents in a document store are indexed, they are available for search.

An indexing session describes all data collected during the pass of a document store’s indexer. Data for all documents is collected during the first indexing session; subsequent indexing sessions collect data for new documents, modified documents, and deleted documents. Thus, the amount of data collected during two different indexing sessions can vary dramatically.

When creating a document store, you can specify Sybase Search to immediately index the document store. You can also perform the following types of indexing after creating a document store in the Document Store Information page:

Incremental Index – click to rerun the indexing process over the saved document store configuration. All new documents are indexed; all updated documents are indexed again; and all deleted documents are removed from the indexes.
Part Index – click to define specific documents that you want to add to a document store. There are two types of part indexes:
- File System Part Index – when processing a File System Part Index, Sybase Search indexes only those documents that exist in any of the document store’s document roots. If a document is already indexed, Sybase Search checks for modification and re-indexes, if necessary. If the document parameter is in the Sybase Search indexes but no longer exists on the file system, it is removed. The Part Index process does not check the directory trees for new, modified, or deleted documents, which can save a significant amount of time for large document stores.
  
  The document must exist within one of the document store’s root directories. Documents that are not available in a valid root directory are ignored.
- Database Part Index – during a Database Incremental Index, the original SQL statement is re-run to find new, updated, and deleted rows. For large databases, this can be time consuming. However, this can be avoided. The Database Part Index can run SQL statements tailored to fetch only the new and updated rows or only the document references of the rows that should be removed. It can also accept a delimited list of document references to remove.
The Part Index processes are primarily for use within OEM applications.

All data collected during an indexing session is stored in the indexing session’s data buffer. The data buffer is a RAM-oriented data structure, where data is aggregated, ready to be written to an index stripe. This buffer is flushed when the maximum memory threshold has been exceeded (specified in the system property omniq.index.buffer.maxMemory). The buffer shares this memory allocation with the document store’s active index stripe. See “Striping index data”.