Character datatypes  text datatype

Chapter 6: Using and Creating Datatypes

unichar datatype

The unichar and univarchar datatypes support the UTF-16 encoding of Unicode in the Adaptive Server. These datatypes are independent of the char and varchar datatypes, but mirror their behavior.

For example, the built-in functions that operate on char and varchar also operate on unichar and univarchar. However, unichar and univarchar store only UTF-16 characters and have no connection to the default character set ID or the default sort order ID as char and varchar do.

Each unichar/univarchar character requires two bytes of storage. The declaration of a unichar/univarchar column is the number of 16-bit Unicode values. The following example creates a table with one unichar column for 10 Unicode values requiring 20 bytes of storage:

create table unitbl (unicol unichar(10))

The length of a unichar/univarchar column is limited by the size of a data page, just as the length of char/varchar columns.

Unicode surrogate pairs use the storage of two 16-bit Unicode values (in other words, four bytes). Be aware of this when declaring columns intended to store Unicode surrogate pairs. By default, Adaptive Server correctly handles surrogates, and does not split the pair. Truncation of Unicode data is handled in a manner similar to that of char and varchar data.

You can use unichar expressions anywhere char expressions are used, including comparison operators, joins, subqueries, and so forth. However, mixed-mode expressions of both unichar and char are performed as unichar. The number of Unicode values that can participate in such operations is limited to the maximum size of a unichar string.

The normalization process modifies Unicode data so there is only a single representation in the database for a given sequence of abstract characters. Often, characters followed by combined diacritics are replaced by pre-combined forms. This allows significant performance optimizations. By default, the server assumes all Unicode data should be normalized.


Relational expressions

All relational expressions involving at least one expression of unichar or univarchar, are based on the default Unicode sort order. If one expression is unichar and the other is varchar (nvarchar, char, or nchar), the latter is implicitly converted to unichar.

The following expressions are most often used in where clauses, where it may be combined with logical operators.

When comparing Unicode character data, “less than” means closer to the beginning of the default Unicode sort order, and “greater than” means closer to the end. “Equality” means the Unicode default sort order makes no distinction between two values (although they need not be identical). For example, the precomposed character ê must be considered equal to the combining sequence consisting of the letter e followed by U+0302. If the Unicode normalization feature is turned on (the default), the Unicode data is automatically normalized and the server never sees unnormalized data.

Table 6-4: Relational expressions

expr1 op_compare [any | all] (subquery)

The use of any or all with comparison operators and expr2 being a subquery, implicitly invokes min or max. For instance, “expr1> any expr2” means, in effect, “expr1> min(expr2)”.

expr1 [not] in (expression list)

expr1 [not] in (subquery)

The in operator checks for equality with each of the elements in expr2, which can be a list of constants, or the results of a subquery.

expr1 [not] between expr2 and expr3

The between operator specifies a range. It is, in effect, shorthand for “expr1 = expr2 and expr1<= expr3”.

expr1 [not] like "match_string" [escape"esc_char”]

The like operator specifies a pattern to be matched. The semantics for pattern matching with Unicode data are the same for regular character data. If expr1 is a unichar column name, then “match_string” may be either a unichar string or a varchar string. In the latter case, an implicit conversion takes place between varchar and unichar


Join operators

Join operators appear in the same manner as comparison operators. In fact, any comparison operator can be used in a join. Expressions involving at least one expression of type unichar are based on the default Unicode sort order. If one expression is of type unichar and the other type varchar (nvarchar, char, or nchar), the latter is implicitly converted to unichar.


Union operators

The union operator operates with unichar data much like it does with varchar data. Corresponding columns from individual queries must be implicitly convertible to unichar, or explicit conversion must be used.


Clauses and modifiers

When unichar and univarchar columns are used in group by and order by clauses, equality is judged according to the default Unicode sort order. This is also true when using the distinct modifier.





Copyright © 2005. Sybase Inc. All rights reserved. text datatype

View this book as PDF