Request for improvement of space padding when transferring data in UTF16LE

See this idea on ideas.ibm.com

Please refer to AAA.jpg for the data on the host side.

If you specify text format (UTF16LE) when transferring this data and receive it, the number of blank characters in the received text will change. (See AAA_UTF16LE.txt)

This phenomenon does not occur in text format (ShiftJIS). (See AAA_UTF16LE.txt)

I was told that this is working as per the specifications, but if the number of blank characters changes, it cannot be used as fixed-length text, so I would like this to be improved.

AAA_UTF16LE.txt

AAA_ShiftJIS.txt

AAA.jpg

Idea priority

Medium

Post comment

Guest

Reply
| Jul 12, 2024

Thank you for submitting your Idea to enhance IBM i Access Client Solutions (ACS). Using a Unicode encoding as a target encoding for mixed data EBCDIC is problematic for maintaining the number of bytes defined for the data and for visual alignment when viewed in a text editor using a fixed width font.

The conversion from EBCDIC bytes will result in a Unicode code point for each character represented by the EBCDIC, but the bytes needed for Unicode data varies depending on the specific data. For example, assume we have a column defined with a length of 12 bytes with rows that have the following data:
Row1: 5 DBCS characters that represent 5 Unicode characters. In UTF-16 this is 10 bytes. Padding to 12 bytes requires 1 more Unicode character.
Row2: 4 DBCS characters and 2 SBCS characters. This is 6 Unicode characters that need 12 bytes. No padding required.
Row3: 12 SBCS characters need 12 Unicode characters. For UTF-16, this is 24 bytes. This means there are more bytes required than the field length.
DBCS=Double Byte Character Set
SBCS=Single Byte Character Set

This example shows that any row where the EBCDIC represents more than 6 characters will result in overflowing the field length with the bytes needed for the Unicode characters. In general, it is not possible to maintain the field length when using UTF-16 as a target encoding. With UTF-32 and UTF-8, more scenarios will result in overflow since a UTF-32 character is 4 bytes and UTF-8 is a variable length encoding and several characters are 2 or more bytes.
Any EBCDIC DBCS character will result in a Unicode character that is visually double width and every EBCDIC SBCS character will result in a Unicode character that is single width. The visual alignment depends on the number of double width and single width characters in the data.

For these reasons, using a Unicode encoding for an ASCII text file will not produce results that maintain the field length or visual alignment.
ACS always uses the field length as a byte count for the data in the ASCII text file. This is different than Access for Windows which does something different when targeting a Unicode encoding for an ASCII text file. Access for Windows multiplied the field length times the maximum size of a character in the specific encoding. For UTF-16 and UTF-32 that ends up being a character count. But for UTF-8, it used the maximum number of bytes that could be used for each character.

Since ShiftJIS already provides the alignment you need, we have no plans to make changes for UTF-16 as requested.

IBM Power Systems Development

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

Request for improvement of space padding when transferring data in UTF16LE

Please enter your email address

RELATED IDEAS

Request for improvement of space padding when transferring data in UTF16LE