This portal is to open public enhancement requests against IBM Power Systems products, including IBM i. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
After much consideration and discussion, we are going to decline this request.
The BOM is added to the file by an editor when the file is specifically saved with one. The editor is an application, not an operating system. This request is asking the IBM i operating system to analyze data content in a file and to make a guess as to the CCSID of that data. The IBM i operating system doesn't have any knowledge of where that data originated, how it was created, or even what type of data it might be. The data in a stream file can be created in many ways and could be anything. Only the user really knows what it is and whether it was brought on to the system via some method like FTP, or if it was created on the system. The user also only knows what the correct encoding is of that data, and should be setting the CCSID tag on the file appropriately.
We will continue to evaluate this request.
The CAAC has reviewed this requirement and recommends that IBM view this as a high priority requirement that is important to be addressed. Homogeneous data is becoming more and more the way of the world -- a solution for every operating system will become inevitable. The BOM is bit-wise always the same, no matter what CCSID or otherwise is attached to it so, if present, can be used to help with the algorithm. The three options described below by IBM seem to be good solutions.
Background: The COMMON Americas Advisory Council (CAAC) members have a broad range of experience in working with small and medium-sized IBM i customers. CAAC has a key role in working with IBM i development to help assess the value and impact of individual RFEs on the broader IBM i community, and has therefore reviewed your RFE.
For more information about CAAC, see www.common.org/caac
For more details about CAAC's role with RFEs, see http://www.ibmsystemsmag.com/Blogs/i-Can/May-2017/COMMON-Americas-Advisory-Council-%28CAAC%29-and-RFEs/
Nancy Uthke-Schmucki - CAAC Program Manager
There are three RFEs that have similar requests 143259, 135926, and 143226 related to the fact that the CCSID attribute of a file does not reflect the actual contents, which has a BOM that is expected to identify the encoding.
These files in general are created on a platform such as a PC that understands only ASCII and/or ASCII-like CCSIDs such as UT-16 or UTF-8. It is much easier for that platform and the applications to make determinations about the encoding based on content of the file without the need for a CCSID attribute. The IBM i does not have that same environment and the content of any file could be EBCDIC or ASCII or ASCII-like so the CCSID attribute is extremely important when an application reads/writes data out of/into the file in text mode. The data could be any string of bits and bytes and we certainly rely on the user/application to inform us of the encoding of that data. What is a BOM in 1208, is something entirely different in 1200 not to mention that 1208 is not the only CCSID that has BOM defined.
It is important to note that when the CCSID of the file is set correctly there are no problems. As has been noted in at least one of the RFEs, the TYPE command in FTP, the Change Attribute (CHGATR) command, the Qp0lSetAttr()???Set Attributes API, or setccsid Qshell utility are options that can be used to set the CCSID for a file.
There are different suggestions in these requests.
- Create a new directory attribute to direct new files created and linked to be assigned the CCSID based on the BOM in the data or inherited from the parent.
- Determine the CCSID when the file is opened based on the data.
- Have an option on the CHGATR command to set the CCSID based on the *CONTENT.
Since all of these RFEs have basically the same goal, 143259 and 135926 are being marked as duplicates and will set 143226 as Under Consideration. Any further commentary should be put under 143226.
The file system cannot be made to guess at the content nor can we use a CCSID because the data is ???probably' UTF8, etc. Only the users know the content of the files. The file system will need to be extremely careful to not change the current behavior for a solution to work. This means that any solution would most certainly require the users to take some steps to have their desired results.
The file system team will consider this request for future development.
Due to processing by IBM, this request was reassigned to have the following updated attributes:
Brand - Servers and Systems Software
Product family - Power Systems
Product - IBM i
Component - IFS (Integrated File System) and Servers
Operating system - IBM i
Source - None
For recording keeping, the previous attributes were:
Brand - Servers and Systems Software
Product family - Power Systems
Product - IBM i
Component - Languages - CL (Control Language)
Operating system - IBM i
Source - None
The RFE posted by Niels could also be a solution. I have now voted for this too.
Perhaps even a combination of his RFE and this RFE would be a powerfull solution.
Well I have several years ago developed a program that analyses the stream file and returns a CCSID.
First it tests for byte order marks (BOM).
If it starts with x'EFBBBF' then it is CCSID 1208 (UTF-8)
If it starts with x'FEFF' then it is CCSID 1200 (UTF-16 big endian)
If a BOM is not found then it reads the first 16MB of the file and scans the content for UTF-8 sequences.
1st Byte 2nd Byte 3rd Byte 4th Byte
0xxxxxxx <---- ASCII character
110xxxxx 10xxxxxx <---- UTF-8
1110xxxx 10xxxxxx 10xxxxxx <---- UTF-8
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx <---- UTF-8
If one of these sequences occur, except for the ASCII character, then we have an UTF-8 file.
But I would like to have it integrated in the operating system.
Windows can handle it so why shouldn't our favorite platform do the same. :-)
I am afraid this will not be easy. Many texts contain the native language as well as quotes or even big extracts in other languages, automation will become very error prone.
I have posted an RFE with a general solution tho you issue here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=143259
This on is in the same groupe:
https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=135926
So maybe a more generic solution will be better
This on goes hand in hand with my request,: Changes the ccsid to 1208 ( or unicode for that matter) if the file have a BOM code, if the folder that contains the file allows that
You can never guess the CCSID of a file someone is sending you... "probably" isn't something that works in IT.
The sender should use the TYPE C nnnn command to tell you which CCSID his data is encoded (if it differs from the CHGFTPA setting).