LOAD TABLE statement

Description

Imports data into a database table from an external ASCII-format file.

Syntax

LOAD [ INTO ] TABLE [ owner ].table-name
... ( load-specification [, ...] )
... FROM { 'filename-string' | filename-variable } [, ...]
... [ CHECK CONSTRAINTS { ON | OFF } IGNORE CONSTRAINTconstrainttype[, ...] ]
... QUOTES OFF
... ESCAPES OFF
... [ FORMAT { 'ascii' | 'binary' } ]
... [ DELIMITED BY 'string' ]
... [ STRIP { ON | OFF } ]
... [ WITH CHECKPOINT { ON | OFF } ]
... [ { BLOCK FACTOR number | BLOCK SIZE number } ]
... [ BYTE ORDER { NATIVE | HIGH | LOW } ]
... [ LIMIT number-of-rows ]
... [ NOTIFY number-of-rows ]
... [ ON FILE ERROR { ROLLBACK | FINISH | CONTINUE} ]
... [ PREVIEW { ON | OFF } ]
... [ ROW DELIMITED BY 'delimiter-string' ]
... [ SKIP number-of-rows ]
... [ START ROW ID number ]
... [ UNLOAD FORMAT ]
... [ IGNORE CONSTRAINT constrainttype [, ...] ]
... [ MESSAGE LOG ‘string’ ROW LOG ‘string’ [ ONLY LOG logwhat [, ...] ]
... [ LOG DELIMITED BY ‘string’ ]

Parameters

load-specification:: { column-name [ column-spec ] | FILLER ( filler-type ) }

column-spec:: { ASCII ( input-width ) | BINARY [ WITH NULL BYTE ] | PREFIX { 1 | 2 | 4 } | 'delimiter-string' | DATE ( input-date-format ) | DATETIME ( input-datetime-format ) } [ NULL ( { BLANKS | ZEROS | 'literal', ... } ) ]

filler-type:: { input-width | PREFIX { 1 | 2 | 4 } | 'delimiter-string' }

constrainttype:: { CHECK integer | UNIQUE integer | NULL integer | FOREIGN KEY integer | DATA VALUE integer | ALL integer }

logwhat:: { CHECK | ALL | NULL | UNIQUE | DATA VALUE | FOREIGN KEY }

Examples

Example 1

Loads data from one file into the product table on a Windows system. A tab is used as the column delimiter following the description and color columns.

LOAD TABLE product
( id   ASCII(6),
FILLER(1),
name   ASCII(15),
FILLER(1),
description   '\x09',
size   ASCII(2),
FILLER(1),
color   '\x09',
quantity   PREFIX 2,
unit_price   PREFIX 2,
FILLER(2) )
FROM 'C:\\mydata\\source1.dmp'
QUOTES OFF
ESCAPES OFF
BYTE ORDER LOW
NOTIFY 1000

Example 2

Loads data from two files into the product_new table (which allows NULL values) on a UNIX system. The tab character is the default column delimiter, and the newline character is the row delimiter.

LOAD TABLE product_new
( id,
name,
description,
size,
color   '\x09'   NULL( 'null', 'none', 'na' ),
quantity   PREFIX 2,
unit_price   PREFIX 2 )
FROM '/s1/mydata/source2.dump', '/s1/mydata/source3.dump'
QUOTES OFF
ESCAPES OFF
BLOCKSIZE 100000
FORMAT ascii
DELIMITED BY '\x09'
ON FILE ERROR CONTINUE
ROW DELIMITED BY '\n'

Usage

The LOAD TABLE statement allows efficient mass insertion into a database table from a file with ASCII or binary data.

The LOAD TABLE options also allow you to control load behavior when integrity constraints are violated and to log information about the violations.

If the WITH CHECKPOINT ON clause is not specified, the file used for loading must be retained in case recovery is required. If WITH CHECKPOINT ON is specified, a checkpoint is carried out after loading, and recovery is guaranteed even if the data file is then removed from the system.

You can use LOAD TABLE on a temporary table, but the temporary table must have been declared with the ON COMMIT PRESERVE ROWS clause or the next COMMIT will remove the rows you've loaded.

You can also specify more than one file to load data. In the FROM clause, you specify each filename-string separated by commas. However, Sybase IQ cannot guarantee that all the data can be loaded because of memory constraints. If memory allocation fails, the entire load transaction is rolled back. The files are read one at a time, and they are processed in a left-to-right order as specified in the FROM clause. Any SKIP or LIMIT value only applies in the beginning of the load, not for each file.

When loading a multiplex database, use absolute (fully-qualified) paths in all filenames. Do not use relative pathnames.

Sybase IQ supports loading from both ASCII and binary data, and it supports both fixed and variable length formats. To handle all of these formats, you must supply a load-specification to tell Sybase IQ what kind of data to expect from each “column” or field in the source file. The column-spec allows you to define these formats:

ASCII with a fixed length of bytes. The input-width value is an integer value indicating the fixed width in bytes of the input field in every record.

Binary fields that use a number of PREFIX bytes (1, 2, or 4) to specify the length of the binary input.

Note that if the data is unloaded using the extraction facility with the TEMP_EXTRACT_BINARY option set ON, then you must use the BINARY WITH NULL BYTE parameter for each column when you load the binary data.

Variable-length characters delimited by a separator. You specify the terminator as hexadecimal ASCII characters. The delimiter-string can be any string of up to 4 characters, including any combination of printable characters, and/or any 8–bit hexadecimal ASCII code that represents a non-printing character. For example, you specify:

'\x09' to represent a tab as the terminator.
'\x00' for a null terminator (no visible terminator as in “C” strings).
'\x0a' for a newline character as the terminator. You can also use the special character combination of '\n' for newline.

The delimiter-string can be from 1 to 4 characters long, but you can only specify a single character in the DELIMITED BY clause.

DATE or DATETIME string as ASCII characters. You must define the input-date-format or input-datetime-format of the string using one of the corresponding formats for the date and datetime data types supported by Sybase IQ. Use DATE for date values and DATETIME for datetime and time values.

**Table 6-9: Formatting dates and times**
Option	Meaning
yyyy or YYYY yy or YY	Represents number of year. Default is current year.
mm or MM	Represents number of month. Always use leading zero or blank for number of the month where appropriate, for example '05' for May. DATE value must include a month. For example, if the DATE value you enter is '1998', you receive an error. If you enter '03', IQ applies the default year and day and converts it to '1998-03-01'.
dd or DD jjj or JJJ	Represents number of day. Default day is 01. Always use leading zeros for number of day where appropriate, for example '01' for first day. J or j indicates a Julian day (1 to 366) of the year.
hh HH	Represents hour. Hour is based on 24-hour clock. Always use leading zeros or blanks for hour where appropriate, for example '01' for 1 am. '00' is also valid value for hour of 12 am.
nn	Represents minute. Always use leading zeros for minute where appropriate, for example '08' for 8 minutes.
ss[.ssssss]	Represents seconds and fraction of a second.
aa	Represents the a.m. or p.m. designation.
pp	Represents the p.m designation only if needed. (This is an incompatibility with IQ releases prior to 12.0; previously, pp was synonymous with aa.)
hh	Sybase IQ assumes zero for minutes and seconds. For example, if the DATETIME value you enter is '03', IQ converts it to '03:00:00.0000'.
hh:nn or hh:mm	Sybase IQ assumes zero for seconds. For example, if the time value you enter is '03:25', IQ converts it to '03:25:00.0000'.

**Table 6-10: Sample DATE and DATETIME format options**
Input Data	Format Specification
12/31/98	DATE ('MM/DD/YY')
19981231	DATE ('YYYYMMDD')
123198140150	DATETIME ('MMDDYYhhnnss')
14:01:50 12-31-98	DATETIME ('hh:mm:ss MM-DD-YY')
18:27:53	DATETIME ('hh:mm:ss')
12/31/98 02:01:50AM	DATETIME ('MM/DD/YY hh:mm:ssaa')

Sybase IQ has built-in load optimizations for common date, time and datetime formats. If your data to be loaded matches one of these formats, you can significantly decrease load time by using the appropriate format. For a list of these formats, and details about optimizing performance when loading date and datetime data, see Chapter 7, “Moving Data In and Out of Databases” in the Sybase IQ System Administration Guide.

You can also specify the date/time field as an ASCII fixed width field (as described above) and use the FILLER(1) option to skip the column delimiter. For more information about specifying date and time data, see Date and time data types or Chapter 7, “Moving Data In and Out of Databases” in the Sybase IQ System Administration Guide.

The NULL portion of the column-spec indicates how to treat certain input values as NULL values when loading into the table column. These characters can include BLANKS, ZEROS, or any other list of literals you define. When specifying a NULL value or reading a NULL value from the source file, the destination column must be able to contain NULLs.

ZEROS are interpreted as follows: the cell is set to NULL if (and only if) the input data (before conversion, if ASCII) is all binary zeros (and not character zeros).

If the input data is character zero, then:

a) NULL(ZEROS) never causes the cell to be NULL

b) NULL('0') causes the cell to be NULL
If the input data is binary zero (all bits clear), then:

a) NULL(ZEROS) causes the cell to be NULL

b) NULL('0') never causes the cell to be NULL

For example, if your LOAD statement includes col1 date(‘yymmdd’) null(zeros)and the date is 000000, you will receive an error indicating that 000000 cannot be converted to a DATE(4). To get the load statement to insert a NULL value in col1 when the data is 000000, you must write the NULL clause as null(‘000000’), or modify the data to equal binary zeros and use NULL(ZEROS).

If the length of a VARCHAR cell is zero and the cell is not NULL, you get a zero-length cell. For all other data types, if the length of the cell is zero, Sybase IQ inserts a NULL. This is ANSI behavior. For non-ANSI treatment of zero-length character data, set the Non_Ansi_Null_Varchar database option.

Another important part of the load-specification is the FILLER option. It indicates you want to skip over a specified field in the source input file. For example, there may be characters at the end of rows or even entire fields in the input files that you do not want to add to the table. As with the column-spec definition, FILLER allows you to specify ASCII fixed length of bytes, variable length characters delimited by a separator, and binary fields using PREFIX bytes.

filename-string The filename-string is passed to the server as a string. The string is therefore subject to the same formatting requirements as other SQL strings. In particular:

To indicate directory paths in Windows systems, the backslash character \ must be represented by two backslashes. Therefore, the statement to load data from the file c:\temp\input.dat into the employee table is:

LOAD TABLE employee
FROM 'c:\\temp\\input.dat' ...

The pathname is relative to the database server, not to the client application. If you are running the statement on a database server on some other computer, the directory names refers to directories on the server machine, not on the client machine.

The following describes each of the clauses of the statement.

QUOTES option If you omit a column-spec definition for an input field and QUOTES is on (the default), the LOAD statement looks for a quote character. The quote character is either an apostrophe (single quote) or a quotation mark (double quote). The first such character encountered in the input file is treated as the quote character for the input file. For Sybase IQ, you must set this option as OFF.

CHECK CONSTRAINTS option This option defaults to ON. When you specify CHECK CONSTRAINTS ON, check constraints are evaluated and you are free to ignore or log them.

Setting CHECK CONSTRAINTS OFF means that IQ ignores all check constraint violations. This can be useful, for example, during database rebuilding. If a table has check constraints that call user-defined functions that are not yet created, the rebuild fails unless this option is set to OFF.

This option is mutually exclusive to the following options. If any of these options are specified in the same load, an error results:

IGNORE CONSTRAINT ALL
IGNORE CONSTRAINT CHECK
LOG ALL
LOG CHECK

ESCAPES option If you omit a column-spec definition for an input field and ESCAPES is on (the default), characters following the backslash character are recognized and interpreted as special characters by the database server. Newline characters can be included as the combination \n, other characters can be included in data as hexadecimal ASCII codes, such as \x09 for the tab character. A sequence of two backslash characters (\\ ) is interpreted as a single backslash. For Sybase IQ, you must set this option as OFF.

FORMAT option Sybase IQ supports ASCII and binary input fields. The format is usually defined by the column-spec described above. If you omit that definition for a column, by default Sybase IQ uses the format defined by this option. Input lines are assumed to have ascii (the default) or binary fields, one row per line, with values separated by the column delimiter character.

DELIMITED BY option If you omit a column delimiter in the column-spec definition, the default column delimiter character is a comma. You can specify an alternative column delimiter by providing a single ASCII character or the hexadecimal character representation. The DELIMITED BY clause is as follows:

... DELIMITED BY '\x09' ...

To use the newline character as a delimiter, you can specify either the special combination '\n' or its ASCII value '\x0a'. Note that, while you can specify up to four characters in the column-spec delimiter-string, you can only specify a single character in the DELIMITED BY clause

STRIP option With STRIP turned on (the default), trailing blanks are stripped from values before they are inserted. This is effective only for VARCHAR data; it does not apply to ASCII fix-width inserts. To turn the STRIP option off, the clause is as follows:

... STRIP OFF ...

Trailing blanks are stripped only for non-quoted strings. Quoted strings retain their trailing blanks. As an alternative, the FILLER option allows you to be more specific in the number of bytes to strip instead of just all the trailing spaces. It is more efficient for Sybase IQ to have this option off, and it adheres to the ANSI standard when dealing with trailing blanks. (char data is always padded, so this option only affects varchar data.)

WITH CHECKPOINT option The default setting is OFF. If set to ON, a checkpoint is issued after successfully completing and logging the statement.

If WITH CHECKPOINT ON is not specified, and recovery is subsequently required, the data file used to load the table is needed for the recovery to complete successfully. If WITH CHECKPOINT ON is specified, and recovery is subsequently required, it will begin after the checkpoint, and the data file need not be present.

BLOCK FACTOR option Specifies blocking factor, or number of records per block, used when a tape was created. This option is not valid for inserts from variable length input fields; use the BLOCKSIZE option instead. However, it does affect all file inserts (including from disk) with fixed length input fields, and it can dramatically affect performance. You cannot specify this option along with the BLOCK SIZE option. The default is 10,000.

BLOCK SIZE option Specifies the default size in bytes in which input should be read. This option only affects variable length input data read from files; it is not valid for fixed length input fields. It is similar to BLOCK FACTOR, but there are no restrictions on the relationship of record size to block size. You cannot specify this option along with the BLOCK FACTOR option. The default is 500,000.

BYTE ORDER option Specifies the byte ordering during reads. This option applies to all binary input fields. If none are defined, this option is ignored. Sybase IQ always reads binary data in the format native to the machine it is running on (default is NATIVE). You can also specify:

HIGH when multibyte quantities have the high order byte first (for big endian platforms like Sun, IBM AIX, and HP).
LOW when multibyte quantities have the low order byte first (for little endian platforms like Windows).

LIMIT option Specifies the maximum number of rows you want to insert into the table. The default is 0 for no limit.

NOTIFY option Specifies that you be notified with a message each time the specified number of rows is successfully inserted into the table. The default is every 100,000 rows. The value of this option overrides the value of the NOTIFY_MODULUS database option.

ON FILE ERROR option Specifies the action Sybase IQ takes when an input file cannot be opened because it does not exist or you have incorrect permissions to read the file. You can specify one of the following:

ROLLBACK aborts the entire transaction (the default).
FINISH finishes the insertions already completed and ends the load operation.
CONTINUE returns an error but only skips the file to continue the load operation. You cannot use this option with partial-width inserts.

Only one ON FILE ERROR clause is permitted.

PREVIEW option Displays the layout of input into the destination table including starting position, name, and data type of each column. Sybase IQ displays this information at the start of the load process. If you are writing to a log file, this information is also included in the log. This option is especially useful with partial-width inserts.

ROW DELIMITED BY option Specifies a string up to 4 bytes in length that indicates the end of an input record. You can use this option only if all fields within the row are any of the following:

Delimited with column terminators
Data defined by the DATE or DATETIME column-spec options
ASCII fixed length fields

You cannot use this option if any input fields contain binary data. With this option, a row terminator causes any missing fields to be set to NULL. All rows must have the same row delimiters, and it must be distinct from all column delimiters. The row and field delimiter strings cannot be an initial subset of each other. For example, you cannot specify “*” as a field delimiter and “*#” as the row delimiter, but you could specify “#” as the field delimiter with that row delimiter.

If a row is missing its delimiters, Sybase IQ returns an error and rolls back the entire load transaction. The only exception is the final record of a file where it rolls back that row and returns a warning message. On Windows, a row delimiter is usually indicated by the newline character followed by the carriage return character. You may need to specify this as the delimiter-string (see above for description) for either this option or FILLER.

SKIP option Lets you define a number of rows to skip at the beginning of the input table(s) for this load. The default is 0.

START ROW ID option Specifies the record identification number of a row in the IQ table where it should start inserting. This option is used for partial-width inserts, which are inserts into a subset of the columns in the table. By default, new rows are inserted wherever there is space in the table, and each insert starts a new row. Partial-width inserts need to start at an existing row. They also need to insert data from the source file into the destination table positionally by column, so you must specify the destination columns in the same order as their corresponding source columns. Define the format of each input column with a column-spec. The default is 0. For more information about partial-width inserts see Chapter 7, “Moving Data In and Out of Databases” of the Sybase IQ System Administration Guide.

Use the START ROW ID option for partial-width inserts only! If the columns being loaded already contain data, the insert will fail. For example:

UNLOAD FORMAT option Specifies that the file has IQ internal unload formats for each column created by an earlier version of IQ (before Version 12.0). This load option has the following restrictions:

You cannot specify any column-spec (such as ASCII or PREFIX) for a column other than BINARY. This includes the NULL specifications.
If you need to load null values for a column using the BINARY column-spec, you must specify the WITH NULL BYTE keyword or IQ will return an error.
You cannot use the DELIMITED BY or ROW DELIMITED BY options with UNLOAD FORMAT.

IGNORE CONSTRAINT option Specifies whether to ignore CHECK, UNIQUE, NULL, DATA VALUE, and/or FOREIGN KEY integrity constraint violations that occur during a load and the maximum number of violations to ignore before initiating a rollback. Specifying each constrainttype has the following result:

CHECK limit If limit specifies zero, then the number of UNIQUE constraint violations to ignore is infinite. If CHECK is not specified, the first occurrence of any CHECK constraint violation causes the LOAD statement to roll back. If limit is non-zero, then the limit +1 occurrence of a CHECK constraint violation causes the load to roll back.

UNIQUE limit If limit specifies zero, then the number of UNIQUE constraint violations to ignore is infinite. If limit is non-zero, then the limit +1 occurrence of a UNIQUE constraint violation causes the load to rollback.

NULL limit If limit specifies zero, then the number of NULL constraint violations to ignore is infinite. If limit is non-zero, then the limit +1 occurrence of a NULL constraint violation causes the load to rollback.

FOREIGN KEY limit If limit specifies zero, then the number of FOREIGN KEY constraint violations to ignore is infinite. If limit is non-zero, then the limit +1 occurrence of a FOREIGN KEY constraint violation causes the load to rollback.

DATA VALUE limit If the database option CONVERSION_ERROR = ON, then an error is reported and the statement rolls back. If limit specifies zero, then the number of DATA VALUE constraint violations (data type conversion errors) to ignore is infinite. If limit is non-zero, then the limit +1 occurrence of a DATA VALUE constraint violation causes the load to rollback.

ALL limit If the database option CONVERSION_ERROR = ON, then an error is reported and the statement rolls back. If limit specifies zero, then the cumulative total of all integrity constraint violations to ignore is infinite. If limit is non-zero, then load rolls back when the cumulative total of all ignored UNIQUE, NULL, DATA VALUE, and FOREIGN KEY integrity constraint violations exceeds the value of limit. For example, you specify the following IGNORE CONSTRAINT option:
```
IGNORE CONSTRAINT NULL 50, UNIQUE 100, ALL 200
 
```
The total number of integrity constraint violations cannot exceed 200, while the total number of NULL and UNIQUE constraint violations cannot exceed 50 and 100, respectively. Whenever any of these limits is exceeded, the LOAD TABLE statement rolls back.

A single row can have more than one integrity constraint violation. Every occurrence of an integrity constraint violation counts towards the limit of that type of violation.

Sybase strongly recommends setting the IGNORE CONSTRAINT option limit to a non-zero value, if you are logging the ignored integrity constraint violations. Logging an excessive number of violations affects the performance of the load.

If CHECK, UNIQUE, NULL, or FOREIGN KEY is not specified in the IGNORE CONSTRAINT clause, then the load rolls back on the first occurrence of each of these types of integrity constraint violation.

If DATA VALUE is not specified in the IGNORE CONSTRAINT clause, then the load rolls back on the first occurrence of this type of integrity constraint violation, unless the database option CONVERSION_ERROR = OFF. If CONVERSION_ERROR = OFF, then a warning is reported for any DATA VALUE constraint violation and the load continues.

When the load completes, an informational message regarding integrity constraint violations is logged in the .iqmsg file. This message contains the number of integrity constraint violations that occurred during the load and the number of rows that were skipped.

MESSAGE LOG option Specifies the names of files in which to log information about integrity constraint violations and the types of violations to log. Timestamps indicating the start and completion of the load are logged in both the MESSAGE LOG and the ROW LOG files. Both MESSAGE LOG and ROW LOG must be specified, or no information about integrity violations is logged.

If the ONLY LOG clause is not specified, then no information on integrity constraint violations is logged. Only the timestamps indicating the start and completion of the load are logged.
Information is logged on all integrity constraint type violations specified in the ONLY LOG clause.
If constraint violations are being logged, then every occurrence of an integrity constraint violation generates exactly one row of information in the MESSAGE LOG file.

The number of rows (errors reported) in the MESSAGE LOG file can exceed the IGNORE CONSTRAINT option limit, because the load is performed by multiple threads running in parallel. More than one thread may report that the number of constraint violations has exceeded the specified limit.
If constraint violations are being logged, then exactly one row of information is logged in the ROW LOG file for a given row, regardless of the number of integrity constraint violations that occur on that row.

Note that the number of distinct errors in the MESSAGE LOG file may not exactly match the number of rows in the ROW LOG file. The difference in the number of rows is due to the parallel processing of the load described above for the MESSAGE LOG.
The MESSAGE LOG and ROW LOG files cannot be raw partitions.
If the MESSAGE LOG or ROW LOG file already exists, then new information is appended to the file.
Specifying an invalid filename for the MESSAGE LOG or ROW LOG file generates an error.
Specifying the same filename for the MESSAGE LOG and ROW LOG files generates an error.

Various combinations of the IGNORE CONSTRAINT and MESSAGE LOG options result in different logging actions, as indicated in Table 6-11.

**Table 6-11: LOAD TABLE logging actions**
IGNORE CONSTRAINT specified?	MESSAGE LOG specified?	Action
yes	yes	All ignored integrity constraint violations are logged, including the user specified limit, before the rollback.
no	yes	The first integrity constraint violation is logged before the rollback.
yes	no	Nothing is logged.
no	no	Nothing is logged. The first integrity constraint violation causes a rollback.

Sybase strongly recommends setting the IGNORE CONSTRAINT option limit to a non-zero value, if you are logging the ignored integrity constraint violations. If a single row has more than one integrity constraint violation, a row for each violation is written to the MESSAGE LOG file. Logging an excessive number of violations affects the performance of the load.

LOG DELIMITED BY option Specifies the separator between data values in the ROW LOG file. The default separator is a comma.

For more details on the contents and format of the MESSAGE LOG and ROW LOG files, see “Bulk loading data using the LOAD TABLE statement” in Chapter 7, “Moving Data In and Out of Databases”of the Sybase IQ System Administration Guide.

Side effects

None.

Standards

SQL/92 Vendor extension.

Sybase Not applicable.

Permissions

The permissions required to execute a LOAD TABLE statement depend on the database server -gl command line option, as follows:

If the -gl option is set to ALL, you must be the owner of the table, have DBA authority, or have ALTER permission.
If the -gl option is set to DBA, you must have DBA authority.
If the -gl option is set to NONE, LOAD TABLE is not permitted.

For more information, see the -gl command line option in “Server command-line options” on page 7 in Chapter 1, “Running the Database Server” of the Sybase IQ Utility Guide.

LOAD TABLE also requires an exclusive lock on the table.