Skip to Main Content
IBM Power Ideas Portal


This portal is to open public enhancement requests against IBM Power Systems products, including IBM i. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Not under consideration
Workspace IBM i
Created by Guest
Created on Jun 17, 2020

Determine CCSID depending on content of the file

One of the big challenges when receiving text files via FTP etc from external sources is to determine what CCSID it is encoded in. By default 1252 or 819 is used but often it should be 1208 ( UTF-8 ) instead. Especially when it is XML-files. If the file has byte order markers (BOM) and the CCSID is the default 1252 then XML-SAX in RPG will fail - invalid characters.

I would like a way to set the CCSID depending on the content of the file. So it could be 1208 (UTF-8), 1200 (UTF-16), 819 (Ascii / ISO-8859-1) etc.

An easy way to implement this could be enhancing CHGATR allowing the value *CONTENT for the CCSID.

CHGATR OBJ('/mydir/myfile.xml') ATR(*CCSID) VALUE(*CONTENT)


Use Case:

Enhance CHGATR so CCSID can be determined depending on the files content.
If the file contains byte order markers or sequences for UTF characters then the file is probably in UTF-format.


Idea priority Medium
  • Guest
    Reply
    |
    Nov 3, 2021

    After much consideration and discussion, we are going to decline this request.

    The BOM is added to the file by an editor when the file is specifically saved with one. The editor is an application, not an operating system. This request is asking the IBM i operating system to analyze data content in a file and to make a guess as to the CCSID of that data. The IBM i operating system doesn't have any knowledge of where that data originated, how it was created, or even what type of data it might be. The data in a stream file can be created in many ways and could be anything. Only the user really knows what it is and whether it was brought on to the system via some method like FTP, or if it was created on the system. The user also only knows what the correct encoding is of that data, and should be setting the CCSID tag on the file appropriately.

  • Guest
    Reply
    |
    Sep 30, 2020

    We will continue to evaluate this request.

  • Guest
    Reply
    |
    Aug 18, 2020

    The CAAC has reviewed this requirement and recommends that IBM view this as a high priority requirement that is important to be addressed. Homogeneous data is becoming more and more the way of the world -- a solution for every operating system will become inevitable. The BOM is bit-wise always the same, no matter what CCSID or otherwise is attached to it so, if present, can be used to help with the algorithm. The three options described below by IBM seem to be good solutions.

    Background: The COMMON Americas Advisory Council (CAAC) members have a broad range of experience in working with small and medium-sized IBM i customers. CAAC has a key role in working with IBM i development to help assess the value and impact of individual RFEs on the broader IBM i community, and has therefore reviewed your RFE.

    For more information about CAAC, see www.common.org/caac

    For more details about CAAC's role with RFEs, see http://www.ibmsystemsmag.com/Blogs/i-Can/May-2017/COMMON-Americas-Advisory-Council-%28CAAC%29-and-RFEs/

    Nancy Uthke-Schmucki - CAAC Program Manager

  • Guest
    Reply
    |
    Jul 30, 2020

    There are three RFEs that have similar requests 143259, 135926, and 143226 related to the fact that the CCSID attribute of a file does not reflect the actual contents, which has a BOM that is expected to identify the encoding.

    These files in general are created on a platform such as a PC that understands only ASCII and/or ASCII-like CCSIDs such as UT-16 or UTF-8. It is much easier for that platform and the applications to make determinations about the encoding based on content of the file without the need for a CCSID attribute. The IBM i does not have that same environment and the content of any file could be EBCDIC or ASCII or ASCII-like so the CCSID attribute is extremely important when an application reads/writes data out of/into the file in text mode. The data could be any string of bits and bytes and we certainly rely on the user/application to inform us of the encoding of that data. What is a BOM in 1208, is something entirely different in 1200 not to mention that 1208 is not the only CCSID that has BOM defined.

    It is important to note that when the CCSID of the file is set correctly there are no problems. As has been noted in at least one of the RFEs, the TYPE command in FTP, the Change Attribute (CHGATR) command, the Qp0lSetAttr()???Set Attributes API, or setccsid Qshell utility are options that can be used to set the CCSID for a file.

    There are different suggestions in these requests.
    - Create a new directory attribute to direct new files created and linked to be assigned the CCSID based on the BOM in the data or inherited from the parent.
    - Determine the CCSID when the file is opened based on the data.
    - Have an option on the CHGATR command to set the CCSID based on the *CONTENT.

    Since all of these RFEs have basically the same goal, 143259 and 135926 are being marked as duplicates and will set 143226 as Under Consideration. Any further commentary should be put under 143226.

    The file system cannot be made to guess at the content nor can we use a CCSID because the data is ???probably' UTF8, etc. Only the users know the content of the files. The file system will need to be extremely careful to not change the current behavior for a solution to work. This means that any solution would most certainly require the users to take some steps to have their desired results.

    The file system team will consider this request for future development.

  • Guest
    Reply
    |
    Jul 29, 2020

    Due to processing by IBM, this request was reassigned to have the following updated attributes:
    Brand - Servers and Systems Software
    Product family - Power Systems
    Product - IBM i
    Component - IFS (Integrated File System) and Servers
    Operating system - IBM i
    Source - None

    For recording keeping, the previous attributes were:
    Brand - Servers and Systems Software
    Product family - Power Systems
    Product - IBM i
    Component - Languages - CL (Control Language)
    Operating system - IBM i
    Source - None

  • Guest
    Reply
    |
    Jun 19, 2020

    The RFE posted by Niels could also be a solution. I have now voted for this too.
    Perhaps even a combination of his RFE and this RFE would be a powerfull solution.

  • Guest
    Reply
    |
    Jun 19, 2020

    Well I have several years ago developed a program that analyses the stream file and returns a CCSID.

    First it tests for byte order marks (BOM).
    If it starts with x'EFBBBF' then it is CCSID 1208 (UTF-8)
    If it starts with x'FEFF' then it is CCSID 1200 (UTF-16 big endian)

    If a BOM is not found then it reads the first 16MB of the file and scans the content for UTF-8 sequences.
    1st Byte 2nd Byte 3rd Byte 4th Byte
    0xxxxxxx <---- ASCII character
    110xxxxx 10xxxxxx <---- UTF-8
    1110xxxx 10xxxxxx 10xxxxxx <---- UTF-8
    11110xxx 10xxxxxx 10xxxxxx 10xxxxxx <---- UTF-8
    If one of these sequences occur, except for the ASCII character, then we have an UTF-8 file.

    But I would like to have it integrated in the operating system.
    Windows can handle it so why shouldn't our favorite platform do the same. :-)

  • Guest
    Reply
    |
    Jun 18, 2020

    I am afraid this will not be easy. Many texts contain the native language as well as quotes or even big extracts in other languages, automation will become very error prone.

  • Guest
    Reply
    |
    Jun 18, 2020

    I have posted an RFE with a general solution tho you issue here:

    http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=143259

  • Guest
    Reply
    |
    Jun 18, 2020

    This on is in the same groupe:

    https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=135926

    So maybe a more generic solution will be better

  • Guest
    Reply
    |
    Jun 18, 2020

    This on goes hand in hand with my request,: Changes the ccsid to 1208 ( or unicode for that matter) if the file have a BOM code, if the folder that contains the file allows that

  • Guest
    Reply
    |
    Jun 18, 2020

    You can never guess the CCSID of a file someone is sending you... "probably" isn't something that works in IT.

    The sender should use the TYPE C nnnn command to tell you which CCSID his data is encoded (if it differs from the CHGFTPA setting).

0 MERGED

Let the IFS set the CCSID depending on BOM codes when writing files

Merged
Today CCSID on files in IFS has no automatic connection the content, which means that you manually have to change the CCSID with CHGATR or setccsid command. This is not practical if files are made by FTP or NETSERVER. If you upload a file with FTP...
over 4 years ago in IBM i / IFS (Integrated File System) and Servers 3 Not under consideration
0 MERGED

Allow UTF-8 with bom to override CCSID attribute for IFS files

Merged
Today you can include SQL in PL/SQL Stored procedures, UDTF and compound statements like this; begin include SQL '/prj/sql/NHODATA/VIEWS/KRTSPLV1.sql';end; However if the included SQL file is in UTF-8 with BOM codes you will get this error: SQL St...
over 5 years ago in IBM i / IFS (Integrated File System) and Servers 6 Not under consideration