Understanding the locale character set

Both application and server locale definitions have a character set. The application uses its character set when requesting character strings from the server. If character set translation is enabled (the default), the database server compares its character set with that of the application to determine whether character set translation is needed.

For a list of available character set labels, see “Character set labels”.

For more information about how to find locale settings, see “Determining locale information”.

The client library determines the character set as follows:

  1. If the connection string specifies a character set, it is used.

    For more information, see “CharSet connection parameter [CS]”.

  2. Open Client applications check the locales.dat file in the Sybase locales directory is used.

  3. Character set information from the operating system is used to determine the locale:

The database server determines the character set for a connection as follows:

  1. The character set specified by the client is used if it is supported.

    For more information, see “CharSet connection parameter [CS]”.

  2. The database's character set is used if the client specifies a character set that is not supported.

When a new database is created, the database server determines the character set for the new database as follows.

  1. A collation is specified in the CREATE DATABASE statement.

  2. The ASCHARSET environment variable is used if it exists.

  3. Character set information from the operating system is used to determine the locale.

When creating an IQ database, the default collation of ISO_BINENG is used if none is explicitly specified.

Character set labels

The following table shows the valid character set label values, together with the equivalent IANA labels and a description:

Character set label

IANA label

Description

big5

<N/A>

Traditional Chinese (cf. CP950)

cp437

<N/A>

IBM CP437 - U.S. code set

cp850

<N/A>

IBM CP850 - European code set

cp852

<N/A>

PC Eastern Europe

cp855

<N/A>

IBM PC Cyrillic

cp856

<N/A>

Alternate Hebrew

cp857

<N/A>

IBM PC Turkish

cp860

<N/A>

PC Portuguese

cp861

<N/A>

PC Icelandic

cp862

<N/A>

PC Hebrew

cp863

<N/A>

IBM PC Canadian French code page

cp864

<N/A>

PC Arabic

cp865

<N/A>

PC Nordic

cp866

<N/A>

PC Russian

cp869

<N/A>

IBM PC Greek

cp874

<N/A>

Microsoft Thai SB code page

cp932

windows-31j

Microsoft CP932 = Win31J-DBCS

cp936

</N/A>

Simplified Chinese

cp949

<N/A>

Korean

cp950

<N/A>

PC (MS) Traditional Chinese

cp1250

<N/A>

MS Windows Eastern European

cp1251

<N/A>

MS Windows Cyrillic

cp1252

<N/A>

MS Windows US (ANSI)

cp1253

<N/A>

MS Windows Greek

cp1254

<N/A>

MS Windows Turkish

cp1255

<N/A>

MS Windows Hebrew

cp1256

<N/A>

MS Windows Arabic

cp1257

<N/A>

MS Windows Baltic

cp1258

<N/A>

MS Windows Vietnamese

deckanji

<N/A>

DEC UNIX JIS encoding

euccns

<N/A>

EUC CNS encoding: Traditional Chinese with extensions

eucgb

<N/A>

EUC GB encoding = Simplified Chinese

eucjis

euc-jp

Sun EUC JIS encoding

eucksc

<N/A>

EUC KSC Korean encoding (cf. CP949)

greek8

<N/A>

HP Greek-8

iso_1

iso_8859-1:1987

ISO 8859-1 Latin-1

iso15

<N/A>

ISO 8859-15 Latin-1 with Euro, etc.

iso88592

iso_8859-2:1987

ISO 8859-2 Latin-2 Eastern Europe

iso88595

iso_8859-5:1988

ISO 8859-5 Latin/Cyrillic

iso88596

iso_8859-6:1987

ISO 8859-6 Latin/Arabic

iso88597

iso_8859-7:1987

ISO 8859-7 Latin/Greek

iso88598

iso_8859-8:1988

ISO 8859-8 Latin/Hebrew

iso88599

iso_8859-9:1989

ISO 8859-9 Latin-5 Turkish

koi8

<N/A>

KOI-8 Cyrillic

mac

macintosh

Standard Mac coding

mac_cyr

<N/A>

Macintosh Cyrillic

mac_ee

<N/A>

Macintosh Eastern European

macgrk2

<N/A>

Macintosh Greek

macturk

<N/A>

Macintosh Turkish

roman8

hp-rpman8

HP Roman-8

sjis

shift_jis

Shift JIS (no extensions)

tis620

<N/A>

TIS-620 Thai standard

turkish8

<N/A>

HP Turkish-8

utf8

utf-8

UTF-8 treated as a character set