Bulk loading data using the LOAD TABLE statement

The LOAD TABLE statement is used for efficient importing of data from a text or binary file into an existing database table. It loads data into any column indexes you have defined, as well as any created automatically.

The permissions needed to execute a LOAD TABLE statement are set on the server command line, using the -gl option. Sybase recommends the -gl all setting, which allows any user to load or unload a table. This is the default setting set by start_asiq. If the -gl option is set to ALL, you must be the owner of the table, have DBA authority, or have ALTER permission, in order to use the LOAD TABLE statement. You also need INSERT permission on the table.

See the description of the ON FILE ERROR load option for what happens when input file errors occur during a load.

Using command files to load data

To load large amounts of data, most users create command files. To create a command file, follow the instructions in the chapter entitled “Getting Started with DBISQL” in the Introduction to Sybase IQ.

Transaction processing and LOAD TABLE

When you issue the LOAD TABLE statement for an IQ table, a savepoint occurs automatically before the data is loaded. If the load completes successfully, Sybase IQ releases the savepoint. If the load fails, the transaction rolls back to the savepoint. This approach gives you flexibility in committing transactions. For example, if you issue two LOAD TABLE commands, you can ensure that either both commit or neither.

When you issue LOAD TABLE for a Catalog Store table, there is no automatic savepoint. If the load succeeds, it commits automatically. If the load fails, it rolls back. You cannot roll back a successful load of a Catalog Store table.

For more information on transaction processing, see Chapter 10, “Transactions and Versioning”.

Integrity constraints and LOAD TABLE

LOAD TABLE allows you to control load behavior when integrity constraints are violated and to selectively log information about the violations. You can specify whether to ignore UNIQUE, NULL, DATA VALUE, and/or FOREIGN KEY constraint violations that occur during a load and the maximum number of violations to ignore before initiating a rollback. You can also direct the load to log information about specific types of integrity constraint violations both per violation in a message log and per row in a row log.

For information on the contents and format of the message and row logs, see “Logging integrity constraint violations”.

Summary of LOAD TABLE syntax

The basic form of the LOAD TABLE statement is:

LOAD TABLE [ owner ].table-name
[ ( load-specification, ... ) ]
FROM 'filename-string', ...
 [ FORMAT { 'ascii' | 'binary' } ]
... [ DELIMITED BY string ]
... [ STRIP { ON | OFF } ]
... [ QUOTES { ON | OFF } ]
... [ ESCAPES { ON | OFF } ]
... [ ESCAPE CHARACTER character ] 
... [ WITH CHECKPOINT ON|OFF ]
... [ load-options ]

For usage and syntax details of all of the LOAD TABLE parameters, see “LOAD TABLE statement” in Chapter 6, “SQL Statements” of the Sybase IQ Reference Manual.

Load specification

The load-specification does the following:

The syntax for load-specification is as follows:

load-specification:	
{ column-name [ column-spec ] |
FILLER ( filler-type ) }

For each column, you can specify a column-spec. If you omit this option, the format information in the load-options applies to this column. The column-spec and load-options format information tell Sybase IQ what type of data to expect, and how to convert it into a compatible data format if necessary.

Syntax for the column-spec is:

column-spec:	
{ ASCII ( input-width ) |
BINARY [ WITH NULL BYTE ] |
PREFIX { 1 | 2 | 4 } |
'delimiter-string' |
DATE ( input-date-format ) |
DATETIME ( input-datetime-format ) }
[ NULL ( { BLANKS | ZEROS | 'literal', ... } ) ]

You can specify the following types of data in the column-spec:

NoteThe column-spec is for IQ tables only. If you specify a column-spec for a Catalog Store table, you get an error.

The NULL portion of the column-spec indicates how to treat certain input values as NULL values when loading into the table column. These characters can include BLANKS, ZEROS, or any other list of literals you define. When you specify a NULL value or read a NULL value from the source file, the destination column must be able to contain NULLs.

ZEROS are interpreted as follows: the cell is set to NULL if (and only if) the input data (before conversion, if ASCII) is all binary zeros (and not character zeros).

For example, if your LOAD statement includes col1 date('yymmdd') null(zeros) and the date is 000000, you will receive an error indicating that 000000 cannot be converted to a DATE(4). To get the load statement to insert a NULL value in col1 when the data is 000000, you must write the NULL clause as null(‘000000’), or modify the data to equal binary zeros and use NULL(ZEROS).

The FILLER clause indicates you want to skip over a specified field in the source input file. For example, there may be characters at the end of rows or even entire fields in the input files that you do not want to add to the table. As with the column-spec definition, FILLER allows you to specify ASCII fixed length of bytes, variable length characters delimited by a separator, and binary fields using PREFIX bytes. The maximum length of a variable-length FILLER column is 512 bytes. FILLER clause syntax is as follows:

FILLER ( filler-type )
filler-type: 
{ input-width | PREFIX { 1 | 2 | 4 } | 'delimiter-string' }

For more information on how to use data conversion options, see “Converting data on insertion”.

Specifying files to load

You specify one or more files from which to load data. In the FROM clause, you specify each filename-string, and separate multiple strings by commas.

The files are read one at a time, and processed in a left-to-right order as specified in the FROM clause. Any SKIP or LIMIT value only applies in the beginning of the load, not for each file.

If a load cannot complete, for example due to insufficient memory, the entire load transaction is rolled back.

filename-string The filename-string is passed to the server as a string. The string is therefore subject to the same formatting requirements as other SQL strings. In particular:

Named pipes

The file specification can be a named pipe. When you load from a named pipe (or FIFO) on Windows, the program writing to the pipe must close the pipe in a special way. The pipe writer must call FlushFileBuffers( ) and then DisconnectNamedPipe( ). (If you do not, Sybase IQ reports an exception from hos_io::Read( ).) This issues a PIPE_NOT_CONNECTED error, which notifies Sybase IQ that the pipe was shut down in an orderly manner rather than an uncontrolled disconnect. See Microsoft documentation for details on these calls.

Specifying table-wide format options

You can specify several options that describe the format of input data.

FORMAT option You can specify a default format for table columns, which applies if you omit the column-spec. The same formats that can appear in the column-spec can appear here. If you also omit the FORMAT load option, the file is assumed to be binary.

DELIMITED BY option If you omit a column delimiter in the column-spec definition, the default column delimiter character is a comma. You can specify an alternative column delimiter by providing a single ASCII character or the hexadecimal representation. In particular, to specify tab-delimited values use the hexadecimal ASCII code of the tab character (9), as follows:

...DELIMITED BY '\x09' ...

To use the newline character as a delimiter, you can specify either the special combination '\n' or its ASCII value '\x0a'.

NoteAlthough the delimiter-string in the column-spec may be a string of up to four characters, the DELIMITED BY option allows only a single ASCII character or its hexadecimal representation.

STRIP option With STRIP turned on (the default), trailing blanks are stripped from values before they are inserted. This is effective only for VARCHAR data. To turn the STRIP option off, enter the clause as follows:

...STRIP OFF ...

Trailing blanks are stripped only for non-quoted strings. Quoted strings retain their trailing blanks. If you don’t require blank sensitivity, you may use the FILLER option allows you to be more specific in the number of bytes to strip instead of just all the trailing spaces.

This option does not apply to ASCII fixed-width inserts. For example, the STRIP option in the following statement is ignored:

LOAD TABLE dba.foo (col1 ascii(3), col2 ascii(3))
FROM foo_data QUOTES OFF ESCAPES OFF STRIP ON

QUOTES option The QUOTES parameter is optional and the default is ON. With QUOTES turned on, LOAD TABLE expects input strings to be enclosed in quote characters. The quote character is either an apostrophe (single quote) or a quotation mark (double quote). The first such character encountered in a string is treated as the quote character for the string. String data must be terminated with a matching quote.

With QUOTES ON, column or row delimiter characters can be included in the column value. Leading and ending quote characters are assumed not to be part of the value and are excluded from the loaded data value.

With QUOTES OFF, Sybase IQ does not strip off apostrophes (single quotes) or quotation marks (double quotes). When it encounters these characters in your input file, it treats them as part of the data. With QUOTES OFF, you cannot include column delimiter characters in column values.

For syntax and usage details, see “LOAD TABLE statement” in Chapter 6, “SQL Statements” of the Sybase IQ Reference Manual.

QUOTES option example Consider a table defined as:

CREATE TABLE t1 (c1 INT, c2 VARCHAR(20), c3 VARCHAR(20))

with the following input data:

1, apple , fruit1	 ,
2, “banana” , “fruit2”,
3, “ pear “, “ fruit3 “,

The result of loading this data is displayed by running the following query:

SELECT c1, c2, c3, LENGTH(c2), LENGTH(c3) FROM t1

The following output displays the result of the query enclosed by ‘<‘ and ‘>’ and the values of the LOAD TABLE options QUOTES and STRIP:

QUOTES option

STRIP option

c1

c2

c3

length(c2)

length(c3)

ON

ON

<1>

<apple>

<fruit1>

<5>

<6>

<2>

<banana>

<fruit2>

<6>

<6>

<3>

< pear >

< fruit3 >

<6>

<8>

ON

OFF

<1>

<apple >

<fruit1 >

<6>

<7>

<2>

<banana>

<fruit2>

<6>

<6>

<3>

< pear >

< fruit3 >

<6>

<8>

OFF

ON

<1>

< apple>

< fruit1>

<6>

<7>

<2>

< “banana”>

< “fruit2”>

<9>

<9>

<3>

< “ pear ”>

< “ fruit3 ”>

<9>

<11>

OFF

OFF

<1>

< apple >

< fruit1 >

<7>

<8>

<2>

< “banana” >

< “fruit2”>

<10>

<9>

<3>

< “ pear ”>

< “ fruit3 ”>

<9>

<11>

Notes on the results:

ESCAPES option Currently, you must specify ESCAPES OFF. The default of ESCAPES ON is provided for compatibility with Adaptive Server Anywhere; this option may be supported in a future version. With ESCAPES turned on, if you omit a column-spec definition for an input field, characters following the backslash character are recognized and interpreted as special characters by the database server. Newline characters can be included as the combination \n, and other characters can be included in data as hexadecimal ASCII codes, such as \x09 for the tab character. A sequence of two backslash characters ( \\ ) is interpreted as a single backslash.

WORD SKIP option Allows the load to continue when it encounters data longer than the limit specified when the word index was created. For details, see “LOAD TABLE statement,” Chapter 6, “SQL Statements,” in the Sybase IQ Reference Manual.

Example

The following UNIX example specifies a BLOCK FACTOR of 50,000 records along with the PREVIEW option:

LOAD TABLE lineitem
    (l_shipmode ASCII(15),
    l_quantity ASCII(8),
    FILLER(30))
FROM '/d1/MILL1/tt.t'
BLOCK FACTOR 50000 PREVIEW ON

Specifying load options

You can specify a wide range of load options. These options tell Sybase IQ how to interpret and process the input file, and what to do when errors occur.

You can specify load options in any order. Syntax for load-options is as follows:

 ...[ { BLOCK FACTOR number | BLOCK SIZE number } ]
... [ BYTE ORDER { NATIVE | HIGH | LOW } ]
... [ LIMIT number-of-rows ]
... [ NOTIFY number-of-rows ]
... [ ON FILE ERROR { ROLLBACK | FINISH | CONTINUE} ]
... [ PREVIEW { ON | OFF } ]
... [ ROW DELIMITED BY 'delimiter-string' ]
... [ SKIP number-of-rows ]
... [ START ROW ID number ]
... [ UNLOAD FORMAT ]
... [ ON PARTIAL INPUT ROW { ROLLBACK | CONTINUE } ]
... [ IGNORE CONSTRAINT constrainttype [, ...] ]
... [ MESSAGE LOGstringROW LOGstring’
    [ ONLY LOG logwhat [, ...] ] ]
... [ LOG DELIMITED BYstring’ ]

The syntax for the parameters is as follows:

constrainttype:
{ CHECK integer | UNIQUE integer
| NULL integer
| FOREIGN KEY integer
| DATA VALUE integer
| ALL integer }
logwhat:
{ CHECK | ALL | NULL | UNIQUE | DATA VALUE | FOREIGN KEY }

Each of these options is described briefly below. For details of all options of the LOAD TABLE statement, see the Sybase IQ Reference Manual.

BLOCK FACTOR option Specifies blocking factor, or number of records per block, used when a source was created. This option is not valid for insertions from variable length input fields; use the BLOCK SIZE option instead. However, it does affect all file inserts (including from disk) with fixed length input fields, and it can affect performance dramatically.

The default setting for BLOCK FACTOR is 10,000. Higher block factors generally improve the speed of I/O operations. However, consider the following when setting this option:

ESCAPE CHARACTER option Specifies an alternative escape character. The default escape character for characters stored as hexadecimal codes and symbols is a backslash (\), so that \x0A is the linefeed character, for example.

This can be changed using the ESCAPE CHARACTER clause. For example, to use the exclamation mark as the escape character, you would enter:

... ESCAPE CHARACTER '!'

Only one single-byte character can be used as an escape character.

NoteBecause you must specify ESCAPES OFF in this version of Sybase IQ, the ESCAPE CHARACTER option has no effect. It is provided for compatibility with Adaptive Server Anywhere.

WITH CHECKPOINT ON clause If this option is set to ON, a checkpoint is issued when the LOAD TABLE statement completes and is logged. In the event recovery is required, it is guaranteed even if the data file is then removed from the system.

If WITH CHECKPOINT ON is not specified, the file used for loading must be retained in case recovery is required.

BLOCK SIZE option Specifies the default size in bytes in which input should be read. This option only affects variable-length input data read from files; it is not valid for fixed-length input fields. It is similar to BLOCK FACTOR, but there are no restrictions on the relationship of record size to block size. You cannot specify this option along with the BLOCK FACTOR option.

The default setting for BLOCK SIZE is 500,000, which is high enough for input from disk files. For tape files, you should specify the same block size that was used when creating the tape. You cannot specify BLOCK SIZE along with BLOCK FACTOR or with any fixed width input fields.

Example

The following UNIX example specifies a BLOCK SIZE of 200,000 bytes:

LOAD TABLE  mm
    (l_orderkey '\x09',
    l_quantity '\x09',
    l_shipdate DATE('YYYY/MM/DD'))
FROM '/d1/MILL1/tt.t'
BLOCK SIZE 200000

BYTE ORDER option Specifies the byte ordering during reads. This option applies to all binary input fields, including those defined as PREFIX 2 or PREFIX 4. If none are defined, this option is ignored. Sybase IQ always reads prefix binary data in the format native to the machine it is running on (default is NATIVE). You can also specify:

Example

Here is a Windows example:

LOAD TABLE nn
    (l_orderkey,
    l_quantity ASCII(PREFIX 2),
    FILLER(2),
FROM 'C:\\iq\archive\\mill.txt'
BYTE ORDER LOW

LIMIT option Specifies the maximum number of rows to insert into the table. The default is 0 for no limit. The maximum is 2GB-1.

LIMIT works together with the SKIP option. SKIP indicates where to begin reading from the input file, and LIMIT specifies how many of those rows to insert. SKIP takes precedence over LIMIT. If you specify multiple input files, these options only affect the first file. The following table shows how these options work together:

Table 7-1: SKIP and LIMIT insert options

If the SKIP value is

And the LIMIT value is

Then IQ does this

0

5

Reads 5 rows and inserts 5 rows.

20

5

Reads 25 rows and inserts 5 rows.

10

5

Reads 10 rows and inserts 5 rows. If the input file has only 8 rows, then zero rows are inserted.

In the following Windows example, no rows are skipped and up to 1,000,000 rows are inserted.

LOAD TABLE lineitem
    (l_shipmode ASCII(15),
    l_quantity ASCII(8),
    FILLER(30))
FROM 'C:\\iq\archive\\mill.txt'
BLOCK FACTOR 1000
PREVIEW ON
LIMIT 1000000

NOTIFY option Specifies that you be notified with a message each time the specified number of rows is inserted successfully into the table. The default is every 100,000 rows. Very frequent notifications can slow down your insert operation. To turn off NOTIFY entirely, set NOTIFY = 0. See “Interpreting notification messages” for an explanation of messages.

ON FILE ERROR option Specifies the action Sybase IQ takes when an input file cannot be opened, either because it does not exist or because you have incorrect permissions to read the file. For all other reasons or errors, it aborts the entire insertion. You can specify one of the following:

PREVIEW option Displays the layout of input into the destination table including starting position, name, and data type of each column. Sybase IQ displays this information at the start of the load process. If you are writing to a log file, this information is also included in the log.

This option is especially useful with partial-width inserts. It can help you diagnose failed or skewed insertions due to incompatible data types, or destination column alignment that does not match source columns. Look at the expected column data type and starting position information to determine if you need to use an insert conversion option on a column and/or where and how much filler to use.

NotePREVIEW ON helps you determine if a load is correct. It does not stop the load from occurring.

ROW DELIMITED BY option Specifies a string up to 4 bytes in length that indicates the end of an input record. You can use this option only if all fields within the row are any of the following:

The row delimiter can be any string of from 1 to 4 8-bit codes, including any combination of printable characters, and/or any 8–bit hexadecimal code that represents a non-printing character. For example, you specify \x09 to represent a tab as the terminator. For a null terminator (no visible terminator as in “C” strings), you specify \x00.

To use the newline character as a row delimiter, you can specify either the special combination '\n' or its ASCII value '\x0a'.

You cannot use this option if any input fields contain binary data. With this option, a row terminator causes any missing fields to be set to NULL. All rows must have the same row delimiters, and it must be distinct from all column delimiters. The row and field delimiter strings cannot be an initial subset of each other. For example, you cannot specify “*” as a field delimiter and “*#” as the row delimiter, but you could specify “#” as the field delimiter with that row delimiter.

If a row is missing its delimiters, Sybase IQ returns an error and rolls back the entire load transaction. The only exception is the final record of a file where it rolls back that row and returns a warning message.

On Windows, a row delimiter is usually indicated by the newline character followed by the carriage return character. You may need to specify this as the delimiter-string for either this option or FILLER.

Example

The following Windows example sets the column delimiter for the l_orderkey column to tab, and the row delimiter to newline (\x0a) followed by carriage return (\x0d):

LOAD TABLE mm
    (l_orderkey '\x09',
    l_quantity ASCII(4),
    FILLER(6),
    l_shipdate DATE('YYYY/MM/DD'))
FROM 'C:\\iq\\archive\\mill.txt'
ROW DELIMITED BY '\x0a\x0d'

SKIP option Lets you define a number of rows to skip at the beginning of the input file(s) for this load. The default is 0. This option works in conjunction with the LIMIT option, and takes precedence over it.

In this UNIX example, Sybase IQ reads 9,000 rows from the input file, skips the first 5,000, and loads the next 4,000. If there are only 8,000 rows in the input file, then only 3,000 rows are loaded.

LOAD TABLE lineitem(
    l_shipmode ASCII(15),
    l_quantity ASCII(8),
FILLER(30))
FROM '/d1/MILL1/tt.t'
BLOCK FACTOR 1000
LIMIT 4000
SKIP 5000
PREVIEW ON

START ROW ID option Specifies the id number of a row in the table where insertions should begin. This option is used for partial-width insertions, which insert into a subset of the columns in the table. If you are inserting data into an existing row, you must define the format of each input column with a column-spec, and use START ROW ID to identify the row where you want to insert it. The default is 0, which causes data to be inserted in a new row wherever there is space in the table. Be sure to read “Partial-width insertions” before using this option and performing partial-width inserts.

UNLOAD FORMAT option Specifies that the data in the input file is in the format produced by the UNLOAD command in Sybase IQ 11.5.1, specifically for upgrading to Sybase IQ 12.x. This format places certain restrictions on other load options you specify:

See the Sybase IQ Installation and Configuration Guide for more information on upgrading.

ON PARTIAL INPUT ROW option Specifies the action to take when a partial input row is encountered during a load. You can specify one of the following:

IGNORE CONSTRAINT option Specifies whether to ignore UNIQUE, NULL, CHECK, DATA VALUE, and/or FOREIGN KEY integrity constraint violations that occur during a load and the maximum number of violations to ignore before initiating a rollback. Specifying each constrainttype has the following result:

If UNIQUE, NULL, CHECK, or FOREIGN KEY is not specified in the IGNORE CONSTRAINT clause, then the load rolls back on the first occurrence of each of these types of integrity constraint violation.

If DATA VALUE is not specified in the IGNORE CONSTRAINT clause, then the load rolls back on the first occurrence of this type of integrity constraint violation, unless the database option CONVERSION_ERROR = OFF. If CONVERSION_ERROR = OFF, then a warning is reported for any DATA VALUE constraint violation and the load continues.

When the load completes, an informational message regarding integrity constraint violations is logged in the .iqmsg file. This message contains the number of integrity constraint violations that occurred during the load and the number of rows that were skipped.

MESSAGE LOG option Specifies the names of files in which to log information about integrity constraint violations and the types of violations to log. Timestamps indicating the start and completion of the load are logged in both the MESSAGE LOG and the ROW LOG files. Both MESSAGE LOG and ROW LOG must be specified, or no information about integrity violations is logged.

Various combinations of the IGNORE CONSTRAINT and MESSAGE LOG options result in different logging actions, as indicated in the following table:

IGNORE CONSTRAINT specified?

MESSAGE LOG specified?

Action

yes

yes

All ignored integrity constraint violations are logged, including the user specified limit, before the rollback.

no

yes

The first integrity constraint violation is logged before the rollback.

yes

no

Nothing is logged.

no

no

Nothing is logged. The first integrity constraint violation causes a rollback.

NoteSybase strongly recommends setting the IGNORE CONSTRAINT option limit to a non-zero value, if you are logging the ignored integrity constraint violations. If a single row has more than one integrity constraint violation, a row for each violation is written to the MESSAGE LOG file. Logging an excessive number of violations affects the performance of the load.

LOG DELIMITED BY option Specifies the separator between data values in the ROW LOG file. The default separator is a comma.

For more details on the contents and format of the MESSAGE LOG and ROW LOG files, see “Logging integrity constraint violations”.

LOAD TABLE adds rows

The LOAD TABLE statement appends the contents of the file to the existing rows of the table; it does not replace the existing rows in the table, unless you specify the START ROW ID load option. See “Partial-width insertions” for examples of how you use this option to insert data into existing rows.

If you want to empty out an existing table and reload it, you can use the TRUNCATE TABLE statement to remove all the rows from a table.

Simple LOAD TABLE example

The following statement loads the data from the file dept.txt into all columns of the department table. This example assumes that no explicit data conversion is needed, and that the width of input columns matches the width of columns in the department table.

LOAD TABLE department
FROM 'dept.txt'