The collation section

After the title line, each non-comment line describes one position in the collation. The ordering of the lines determines the sort ordering used by the database, and determines the result of comparisons. Characters on lines appearing higher in the file (closer to the beginning) sort before characters that appear later.

The form of each line in the sequence is:

[sort-position] : character [ [, character ] ...]

[sort-position] : character [lowercase uppercase]

Argument	Description
sort-position	Optional. Specifies the position at which the characters on that line will sort. Smaller numbers represent a lesser value, so will sort closer to the beginning of the sorted set. Typically, the sort-position is omitted, and the characters sort immediately following the characters from the previous sort position.
character	The character whose sort-position is being specified.
lowercase	Optional. Specifies the lowercase equivalent of the character. If not specified, the character has no lowercase equivalent.
uppercase	Optional. Specifies the uppercase equivalent of the character. If not specified, the character has no uppercase equivalent.

Multiple characters may appear on one line, separated by commas (,). In this case, these characters are sorted and compared as if they were the same character.

Each character and sort position is specified in one of the following ways:

Specification	Description
\dnnn	Decimal number, using digits 0-9 (such as \d001)
\xhh	Hexadecimal number, using digits 0-9 and letters a-f or A-F (such as \xB4)
'c'	Any character in place of c (such as ',')
c	Any character other than quote ('), backslash (\), colon (:) or comma (,). These characters must use one of the previous forms.

The following are some sample lines for a collation:

% Sort some special characters at the beginning:
: ' '
: _
: \xF2
: \xEE
: \xF0
: -
: ','
: ;
: ':'
: !
% Sort some letters in alphabetical order
: A a A
: a a A
: B b B
: b b B
% Sort some E's from code page 850,
% including some accented extended characters:
: e e E, \x82 \x82 \x90, \x8A \x8A \xD4
: E e E, \x90 \x82 \x90, \xD4 \x8A \xD4

For databases using case-insensitive sorting and comparison (that is, CASE IGNORE was specified when the database was created), the lowercase and uppercase mappings are used to find the lowercase and uppercase characters that will be sorted together.

When a database is created with CASE IGNORE, queries may return data in either upper or lower case, depending on the type of index the optimizer chose to use. You can return all upper case data in such a situation by using this command:

SET TEMPORARY OPTION AGGREGATION_PREFERENCE=-2

For multibyte character sets, the first byte of a character is listed in the collation sequence, and all characters with the same first byte are sorted together, and ordered according to the value of the following bytes. For example, the following is part of the Shift-JIS collation file:

:	\xfb
:	\xfc
:	\xfd

In this collation, all characters with first byte \xfc come after all characters with first byte \xfb and before all characters with first byte \xfd. The two-byte character \xfc \x01 would be ordered before the two-byte character \xfc \x02.

Any characters omitted from the collation are added to the end of the collation. The tool that processes the collation file issues a warning.

The collation section

Argument descriptions

Specifying character and sort-position

Other syntax notes