MPF indexes store compressed text paragraphs from the indexed documents, which are stored separately from the main calculation data. Unlike the main term and metadata indexes, MPF files are not open all the time, which ensures that the total number of file handles used by the container’s JVM is not exceeded.
These settings control how many documents are stored per MPF file, how many MPF files are stored per folder, and the maximum number of folders per folder:
<SystemProperty name="omniq.index.mpf.docsPerFile" value="20"/> <SystemProperty name="omniq.index.mpf.filesPerFolder" value="250"/> <SystemProperty name="omniq.index.mpf.foldersPerFolder" value="50"/>
In most cases, the filesPerfolder and foldersPerFolder does not impact indexing or querying performance, unless Sybase Search is running on an operating system where there is a performance limitation with the distribution of files and folders.
docsPerFile parameter does affect your performance. When you index small documents, the default setting generates smaller MPF files. If you increase docsPerFile to 100, for example, the number of MPF files being generated would reduce by a factor of 5 and ensure more efficient paragraph storage.If you mostly index large documents, the default setting generates large MPF files, which can lead to long file seek times when retrieving paragraphs. Decrease docsPerFile to 15 or 10, to reduce the average MPF size and reduce seek times.
These settings determine how many paragraphs are grouped together for reading and writing and the maximum number of group entries allowed:
<SystemProperty name="omniq.index.mpf.maxParagraphGroups" value="5"/> <SystemProperty name="omniq.index.mpf.maxTotalGroupEntries" value="50"/>
For this particular setting, a higher value for maxParagraphGroups helps in compression since more paragraphs compress better than trying to compress them separately. Depending on your requirement for minimum number of paragraphs for grouping, you should change the default value.
The first paragraph returned for a given relevant document
may not necessarily be the first paragraph stored in the MPF file.
The most relevant paragraph from the first few paragraphs is returned
first. Setting maxParagraphGroups to 1 does not
mean that only one paragraph is read from the MPF files, in other
words, it does not retrieve one paragraph per page.
Index unification combines multiple index stripes into a single stripe and ensures that the underlying indexes are kept efficient with data.
Index unification reads data for each item from the various stripes, combines the data and writes it to a new index stripe. As with standard document indexing, a buffer stores the data before it is written to disk. Unlike standard document indexing, data is read sequentially, so increasing the unifier buffer settings provides a lesser amount of performance improvement.
For example, item “java” is only read once, unlike the main document indexing, where more data for “java” could be obtained from additional documents and combined with the existing buffer's data for “java”.Increase these buffer sizes for a slightly better speed unification:
<SystemProperty name="omniq.unifier.termMapSizeSoftLimit" value="40K"/> <SystemProperty name="omniq.unifier.termMapSizeInBytesSoftLimit" value="32MB"/> <SystemProperty name="omniq.unifier.metadataMapSizeSoftLimit" value="40K"/> <SystemProperty name="omniq.unifier.metadataMapSizeInBytesSoftLimit" value="32MB"/>
These parameters have a larger impact on index unification performance:
<SystemProperty name="omniq.unifier.sleepDurationMillis" value="20"/> <SystemProperty name="omniq.unifier.sleepFrequency" value="100"/>
Sleep frequency and duration parameters in index unification work in the same way as for main document indexing. Setting the two parameter value to 0 (zero) effectively disables index sleeping for index unification process.
If you have I/O-intensive tasks running simultaneously
with index unification, consider increasing the sleep time parameters
to ensure that indexing runs more as a background process.