Unicode support in PowerBuilder

PowerBuilder uses UTF-16LE encoding internally. The source code in PBLs is encoded in UTF-16LE, any text entered in an application is automatically converted to Unicode, and the string and character PowerScript datatypes hold Unicode data only. Any ANSI or DBCS characters assigned to these datatypes are converted internally to Unicode encoding.

Support for Unicode databases

Most PowerBuilder database interfaces support both ANSI and Unicode databases.

A Unicode database is a database whose character set is set to a Unicode format, such as UTF-8 or UTF-16. All data in the database is in Unicode format, and any data saved to the database must be converted to Unicode data implicitly or explicitly.

A database that uses ANSI (or DBCS) as its character set can use special datatypes to store Unicode data. These datatypes are NChar, NVarChar, and NVarChar2. Columns with one of these datatypes can store Unicode data, but data saved to such a column must be converted to Unicode explicitly.

For more specific information about each interface, see Connecting to Your Database.

String functions

PowerBuilder string functions, such as Fill, Len, Mid, and Pos, take characters instead of bytes as parameters or return values and return the same results in all environments. These functions have a “wide” version (such as FillW) that is obsolete and will be removed in a future version of PowerBuilder because it produces the same results as the standard version of the function. Some of these functions also have an ANSI version (such as FillA). This version is provided for backwards compatibility for users in DBCS environments who used the standard version of the string function in previous versions of PowerBuilder to return bytes instead of characters.

You can use the GetEnvironment function to determine the character set used in the environment:

environment env
getenvironment(env)

choose case env.charset
case charsetdbcs!
	// DBCS processing
	...
case charsetunicode!
	// Unicode processing
	...
case charsetansi!
	// ANSI processing
	...
case else
	// Other processing
	...
end choose

Encoding enumeration

Several functions, including Blob, BlobEdit, FileEncoding, FileOpen, SaveAs, and String, have an optional encoding parameter. These functions let you work with blobs and files with ANSI, UTF-8, UTF-16LE, and UTF-16BE encoding. If you do not specify this parameter, the default encoding used for SaveAs and FileOpen is ANSI. For other functions, the default is UTF-16LE.

The following examples illustrate how to open different kinds of files using FileOpen:

// Read an ANSI File
Integer li_FileNum
String s_rec
li_FileNum = FileOpen("Employee.txt")
// or:
// li_FileNum = FileOpen("Emplyee.txt", &
//    LineMode!, Read!)
FileRead(li_FileNum, s_rec)

// Read a Unicode File
Integer li_FileNum
String s_rec
li_FileNum = FileOpen("EmployeeU.txt", LineMode!, &
   Read!, EncodingUTF16LE!)
FileRead(li_FileNum, s_rec)

// Read a Binary File
Integer li_FileNum
blob bal_rec
li_FileNum = FileOpen("Employee.imp", Stream Mode!, &
   Read!)
FileRead(li_FileNum, bal_rec)

Initialization files

The SetProfileString function can write to initialization files with ANSI or UTF16-LE encoding on Windows systems, and ANSI or UTF16-BE encoding on UNIX systems. The ProfileInt and ProfileString PowerScript functions and DataWindow expression functions can read files with these encoding schemes.

Exporting and importing source

The Export Library Entry dialog box lets you select the type of encoding for an exported file. The choices are ANSI/DBCS, which lets you import the file into PowerBuilder 9 or earlier, HEXASCII, UTF8, or Unicode LE.

The HEXASCII export format is used for source-controlled files. Unicode strings are represented by hexadecimal/ASCII strings in the exported file, which has the letters HA at the beginning of the header to identify it as a file that might contain such strings. You cannot import HEXASCII files into PowerBuilder 9 or earlier.

If you import an exported file from PowerBuilder 9 or earlier, the source code in the file is converted to Unicode before the object is added to the PBL.

External functions

When you call an external function that returns an ANSI string or has an ANSI string argument, you must use an ALIAS clause in the external function declaration and add ;ansi to the function name. For example:

FUNCTION int MessageBox(int handle, string content, string title, int showtype)
LIBRARY "user32.dll" ALIAS FOR "MessageBoxA;ansi"

The following declaration is for the “wide” version of the function, which uses Unicode strings:

FUNCTION int MessageBox(int handle, string content, string title, int showtype)
LIBRARY "user32.dll" ALIAS FOR "MessageBoxW"

If you are migrating an application from PowerBuilder 9 or earlier, PowerBuilder replaces function declarations that use ANSI strings with the correct syntax automatically.

Setting fonts for multiple language support

The default font in the System Options and Design Options dialog boxes is Tahoma.

Setting the font in the System Options dialog box to Tahoma ensures that multiple languages display correctly in the Layout and Properties views in the Window, User Object, and Menu painters and in the wizards.

If the font on the Editor Font page in the Design Options dialog box is not set to Tahoma, multiple languages cannot be displayed in Script views, the File and Source editors, the ISQL view in the DataBase painter, and the Debug window.

You can select a different font for printing on the Printer Font tab page of the Design Options dialog box for Script views, the File and Source editors, and the ISQL view in the DataBase painter. If the printer font is set to Tahoma and the Tahoma font is not installed on the printer, PowerBuilder downloads the entire font set to the printer when it encounters a multilanguage character. If you need to print multilanguage characters, specify a printer font that is installed on your printer.

To support multiple languages in DataWindow objects, set the font in every column and text control to Tahoma.

The default font for print functions is the system font. Use the PrintDefineFont and PrintSetFont functions to specify a font that is available on users’ printers and supports multiple languages.

PBNI

The PowerBuilder Native Interface is Unicode based. PBNI extensions must be compiled using the _UNICODE preprocessor directive in your C++ development environment.

Your extension’s code must use TCHAR, LPTSTR, or LPCTSTR instead of char, char*, and const char* to ensure that it works correctly in a Unicode environment. Alternatively, you can use the MultiByteToWideChar function to map character strings to Unicode strings. For more information about enabling Unicode in your application, see the documentation for your C++ development environment.

Unicode enabling for Web services

In a PowerScript target, the PBNI extension classes instantiated by Web service client applications use Unicode for all internal processing. However, calls to component methods are converted to ANSI for processing by EasySoap, and data returned from these calls is converted to Unicode.

In a JSP target, the authoring tool (HTML editor) is Unicode-enabled so you can input text in multiple languages on a single page. When you type in the editor, the text is saved in UTF-16LE encoding. However, JSP files with other encoding schemes can still be imported in the editor. Text with these encodings is automatically converted to UTF-16LE.

XML string encoding

The XML parser cannot parse a string that uses an eight-bit character code such as windows-1253. For example, a string with the following declaration cannot be parsed:

string ls_xml
ls_xml += '<?xml version="1.0" encoding="windows-1253"?>'

You must use a Unicode encoding value such as UTF16-LE.