MPF classes compress all the paragraphs from all documents, favoring those of average length (where the average length is implied from the MPF configuration). Each paragraph is written to disk in one of two ways:
The paragraph added to a paragraph group, which is compressed and written to disk.
Each paragraph is compressed individually and written to disk.
The first technique is employed initially, as the compression scheme works better with more data; as a result, paragraphs take up less space on disk. The second technique is used when the paragraph group allocation is exhausted.
Paragraphs are not all written together, as it is often necessary to read individual paragraphs from disk. Compressing all the paragraphs together forces to read and decompress all paragraphs to access the sole paragraph required. Grouping provides a balance between data compression and disk I/O.
The number of paragraphs in any one paragraph group is not fixed; groups accept new paragraphs until the data buffer’s soft limit is reached. “Soft” indicates that a limit can be exceeded, but the group is then closed. The ideal scenario is when all paragraphs from a document fit exactly within the allocated number of paragraph groups.
Configure paragraph grouping using the MPF parameters shown in Table 3-46. The MPF parameters are defined for all document stores in a container and are set in the main container file Container.ID.xml.
Parameter |
Default |
Description |
---|---|---|
omniq.index.mpf.docsPerFile |
20 |
The number of documents stored in each MPF. |
omniq.index.mpf.filesPerFolder |
250 |
The number of MPFs stored in each directory. |
omniq.index.mpf.foldersPerFolder |
50 |
The number of MPF directories stored per directory. |
omniq.index.mpf.maxParagraphGroups |
5 |
The maximum number of paragraph groups to allocate per document. |
omniq.index.mpf.maxTotalGroupEntries |
50 |
The maximum number of paragraphs from any one document that can be in a paragraph group. |
omniq.index.mpf.bufferSoftLimit |
8192 |
The ideal number of bytes an uncompressed paragraph group can consume before it is closed, compressed, and written to disk. By design, Sybase Search usually slightly exceeds this limit. |