Portability and the Tranfer Encoding

The shim has been designed to work with UTF-8 or other byte-oriented tranfer encodings, and database setup uses UTF-8 as the default table encoding. Use of the shim with systems having larger transfer encoding unit sizes is not supported, and this includes in particular connection to an api server running in a UTF-16 environment.

The database table load files in the distribution use only the ASCII subset of UTF-8, and in our environment the IB tws api server likewise confines itself to ASCII. Given those limitations, platforms with other byte-oriented transfer encodings have been used to host the IB tws, see e.g., Table 6.1.

Table 6.1: Examples of java locale and default encoding parameters
os locale language file encoding default bits
Linux en_US en UTF-8 UTF-8 8
Windows 2000 1 en_US en Cp1252 Cp1252 8
Mac OS X en_US en MacRoman MacRoman 8
1 We use Windows for one legacy application only, and so the locally available Windows box is quite antiquated.

The shim uses a table-driven scanner to tokenize input, and its character classification tables are designed to work with bytes, with bytes outside the 7-bit ASCII range treated as alphabetic. Beyond this, the shim merely passes bytes around, so that the interpretation given to those bytes by the downstream applications or the IB tws is outside its control.

Although the shim has been observed to successfully connect to, make requests of, and parse the messages of IB tws api servers on platforms with a byte-oriented Java transfer encoding other than UTF-8, e.g., Cp1252 or MacRoman, such operation is the sole responsibility of the user. Such encodings have ASCII as a common subset with UTF-8, and their apparent interoperability is an artifact of the restriction in practice of symbol and other character text data to the ASCII subset.

Bill Pippin 2010-01-14