Import parameters

This section describes the import parameters. There is one table for each import parameter type.

Table C-6: Input parameters

Key

Default

Value

Description 

TextImportExtns

None

String

Any extra extensions for text import.

HTMLImportExtns

None

String

Any extra extensions for HTML import.

ImportDefaultExtension

None

String

If the import module finds a file with an unknown extension, it treats it as the type you enter here, for example, .txt or .html.

ImportTempDir

.\

String

A temporary directory for the importer to use when importing nonstandard formats; in other words, documents that are not .txt or .html.

ImportRecursive DirectoryImporting

FALSE

TRUE/ FALSE

When set to TRUE, recursively imports subdirectories.

Table C-7: Content parameters

Key

Default

Value

Description 

ImportMinLength

0

Integer

Any document with size smaller than this number of bytes is not imported. The size of the document is measured according to the plain text content.

ImportMaxLength

100000000

Integer

Any document with size greater than this number of bytes is not imported. The size of the document is measured according to the plain text content.

ImportMinLengthWords

0

Integer

Any document with less than this number of words is not imported.

ImportMaxLengthWords

1000000

Integer

Any document with more than this number of words is not imported.

ImportStoreContent

TRUE

TRUE/ FALSE

Specifies whether to store the content of the document in the destination DRE.

ImportSummary

FALSE

TRUE/ FALSE

Imports a field called “summary” that contains the quick summary information generated by the importing module.

ImportSummarySize

3

Integer

Number of sentences to be used in the quick summary generated by the importing module.

ImportBreaking

FALSE

TRUE/ FALSE

Specifies whether to break the document into sections when importing. To enable, you must also set the value of combine to 1 in the PortalSearchqueryh.cfg or DRE.INI file of the engine into which the data is being entered.

ImportBreakingMin    ParagraphWords

160

Integer

The smallest section that can be created when attempting to break the document at paragraph boundaries.

ImportBreakingMax    ParagraphWords

360

Integer

The largest section that can be created when attempting to break the document at paragraph boundaries.

ImportBreakingMinDoc   Words

600

Integer

The size above which document breaking is implemented. Smaller documents remain whole.

ImportIntelligentTitle    Summary

FALSE

TRUE/ FALSE

When set to TRUE, the import module attempts to find unique titles and summaries by comparing the title or summary of each document with the next. For example, if the first document has the title “News Today,” then the import module sets this as its title. If a second document has the same title, the import module attempts to find a unique title either from the headings (<H1>, <H2>, and so on) or failing that, from the content.

Table C-8: Field parameters

Key

Default

Value

Description 

FixedFieldNamen

None

String

The name of the field in PortalSearchqueryh.cfg or DRE.INI that is used to store the fixed field value.

FixedFieldValuen

None

String

The value that is stored in the DRE field denoted by FixedFieldNamen.

FieldNamen

None

String

The name of the field in PortalSearchqueryh.cfg or DRE.INI that stores the dynamic field value.

FieldStartn

None

String

Specifies the start of the value to be stored in the DRE field named in FieldNamen.

FieldStopn

None

String

Specifies the end of the value to be stored in the DRE field named in FieldNamen.

ImportRemapFieldn

None

String

The name of the field whose value should be used as the value of an alternative field specified by ImportRemapFieldTon.

ImportRemapFieldTon

None

String

The name of the field that takes its value from an alternative field specified by ImportRemapFieldn.

ImportFieldHTML    ConvertChars

FALSE

TRUE/ FALSE

Set this parameter to TRUE if you want HTML entities in fields to be replaced with an equivalent character. For example, &nbsp is replaced with a space character.

ImportMetaToFields

TRUE

TRUE/ FALSE

Specifies whether to use HTML meta tags for DRE fields.

ImportChecksum

FALSE

TRUE/ FALSE

If set to TRUE, a value is added to the Checksum field in the [Field] section in PortalSearchqueryh.cfg or DRE.INI. This field value is used to determine whether to show a document result in the front end.

HTMLFieldNamen

None

String

The name of the field in PortalSearchqueryh.cfg or DRE.INI that is used to store the dynamic field value in an HTML document. The value stored is not HTML-stripped.

HTMLFieldStartn

None

String

Specifies the start of the value to be stored in the DRE field named in HTMLFieldNamen.

HTMLFieldStopn

None

String

Specifies the end of the value to be stored in the DRE field named in HTMLFieldNamen.

Table C-9: DRE parameter

Key

Default

Value

Description 

Database

String

The database into which documents are indexed.

Table C-10: Data parameters

Key

Default

Value

Description

ImportExtractDateFrom

0

Integer *

The date to be extracted from the document.

0 – nothing. 1 – current time. 2 – last accessed date. 3 – last modified date. 16 – from DRE field. 32 – from content. 64 – from file name.

ImportExtractDateFrom   Field

None

String

The name of a date field in the DRE whose value is to be extracted.

ImportExtractDateTo    Field

DREDATE

String

The name of a date field in the DRE whose value is taken from ImportExtractDateFromField.

ImportExtractDate    FormatCSVs

None

String

The comma-separated format in which dates are to be extracted. The format specifiers are used whenever dates can be specified in the DDMMYY format style strings:

  • YY – 2-digit year, for example, 99 or 00 or 01.

  • YYYY – 4-digit year, for example, 1999, 2000, 2001.

  • LONGMONTH – January, March, August.

  • SHORTMONTH – Jan, Mar, Aug.

  • MM – 2-digit month, for example, 01, 10, 12.

  • M+ – 1- or 2-digit month, for example, 1,2,3,10.

  • DD – 2-digit day, for example, 01, 02, 03, 12, 23.

  • D+ – 1- or 2-digit day, for example, 1, 2, 12, 13, 31.

  • HH – 2-digit hour, for example, 01, 12, 13.

  • H= – 1- or 2-digit hour, for example, 1, 2, 12, 13, 31.

  • NN – 2-digit minute, for example, 01, 12, 13.

  • N+ – 1- or 2-digit minute, for example, 1, 2, 12, 13, 31.

  • SS – 2-digit second, for example, 01, 12, 13.

  • S+ – 1- or 2-digit second, for example, 1, 2, 12, 13, 31.

  • ZZZ – time zone, for example, GMT, EST, PST.

The correct syntax is illustrated in the following examples:

  • Dates with 1- or 2-digit days:

    DateFormats=D+/SHORTMONTH YYYY, DDMMYY
    
  • Quoted string to allow spaces, commas, and so on, within the format:

    DateFormats= “D+SHORTMONTH YYYY”, “Date: D+ LONGMONTH, YYYY”
    
  • Directory style dates :

    DateFormats+D+/M+/YY, MM/DD/YYYY
    

ImportExtractDateTo    Format

None

String

Specifies the format of the dates extracted once in the DRE. If the value of ImportExtractDateToField is set to DREDATE, then ImportExtractDateToFormat is set to the current date in this format: yyyy/mm/dd.

ImportExtractLength

FALSE

TRUE/ FALSE

Specifies whether to allow the extraction of the file length.

ImportExtractLengthTo   Field

FILELENGTH

String

If ImportExtractLength is set to 1, the file length is extracted into the DRE field specified by this parameter; for example:

ImportExtractLengthToField=FileLength
Table C-11: Page layout parameters

Key

Default

Value

Description 

ImportStartDefCSVs

None

String

Comma-separated list of strings that marks the beginning of the text in a document to be indexed into the DRE.

ImportEndDefCSVs

None

String

Comma-separated list of strings that marks the end of the text in a document to be indexed into the DRE.

HTMLImportStartDef    CSVs

None

String

Comma-separated list of strings that marks the beginning of text in an HTML document to be indexed into the DRE.

HTMLImportEndDef    CSVs

None

String

Comma-separated list of strings that marks the end of text in an HTML document to be indexed into the DRE.

ImportPageBreakDefs

None

String

Specifies a string used to mark a document break. Use this parameter to split documents into segments with each segment contained in one idx format. After the idx files have been indexed into the DRE, the individual segments are joined back together to produce the original document.

ImportStartSkipWords

0

Integer

Specifies the number of words to skip from the beginning of a document when importing a document.

ImportStartSkip    Sentences

0

Integer

Specifies the number of sentences to skip from the beginning of a document when importing a document.

Table C-12: Path parameters

Key

Default

Value

Description 

ImportPathReplaceUp    ToSlash

-1

Integer

Specifies the slash through which the string is replaced with the string specified in ImportPathReplaceString. You can use this parameter to replace a single portion of many strings that contain different substrings, as opposed replacing a single word.

For example, if the files in the queue are c:\a\b\hello,c:\a\c\hello, and c:\b\a\hello, setting ImportPathReplaceUpToSlash to 3 will replace the portion of the strings up to the third backslash in each string. By default, nothings is replaced.

ImportPathReplace    String

None

String

Specifies a string to replace the string through the slash indicated by the ImportPathReplaceUpToSlash parameter.

ImportRefTruncateAfter

-1

Integer

Truncates the reference after the nth occurrence of the string specified in ImportRefTruncateString. By default, no truncation occurs.

ImportRefTruncate    String

None

String

Specifies the string used to truncate the reference after the nth occurrence.

ImportRefReplaceCSVs

None

String

Specifies the string to be replaced.

ImportRefReplaceWith   CSVs

None

String

Specifies the string to replace the original reference of the file.

ImportRegisterExtn    CSVsN=

None

String

Specifies the files with certain extensions that will be imported; for example:

ImportRegisterExtnCSVs0=HTML, XML

ImportRegister    ExecutableN=

None

String

This specifies the import slave that imports the corresponding files (see ImportRegisterExtnCSVs); for example:

ImportRegisterExecutable0=testslave.exe

In this example, testslave.exe imports HTML and XML files.