When using SQL on a table with 1 or more field procedure programs attached to 1 or more columns, a select to produce output will demonstrate DETERMINISTIC behavior in that output values will be cached and the field procedure program will only fire/execute for each distinct output value, for the column(s) the fp program is attached to. When an SQL insert is issued, the field procedure program will fire/execute on every single input value to be inserted into the table, for the column(s) the fp program is attached to. No DETERMINISTIC behavior or "cache'ing" appears to be present.
Use Case: We use a common field procedure program to maintain at-rest/in-transient field level encryption/decryption.
My table has 45M rows. My row has 100 columns. 25 columns have a common field procedure program attached.
A "select * from encryptedTable" will return output very quickly, as the field procedure pgm is only executing for non-distinct output values amongst the 25 columns. It appears to be DETERMINISTIC and caches the output values so the field procedure program does not need to be executed on each row.
When I issue a "insert into encryptedTable select * from nonEncryptedTable", the field procedure program will execute on every single row/column despite whether it is non-distinct or not. This statement can be very time consuming to carry out and resource intensive.
Here is why decryption in this scenario (and as a general concept) takes much less time than encryption.
Consider the source (encrypted) file (45M rows) with 25 encrypted columns which have the same clear text value in every column, with one exception. Row 1, column 1 value is distinct.
The field procedure program only has to execute 25 times. That is 1 time for each value on the first row read (+ 1 extra time for the row/column that contains the oddball 'x' value in row 1 b0001).
So apply that to a real set of data, and the field procedure program may not be required to execute for every row in the table during decryption. I believe this optimizing feature is part of the DB2 database engine. This is DETERMINISTIC behavior within DB2, that is commonly used in SQL stored procedures. DETERMINISTIC = will always return the same result from successive calls with identical input arguments.
Whereas in encrypting (via the above mentioned insert statement) the data for the 25 columns requires the field procedure to execute for every single row so that a formulated encoded value can be stored for the encrypted column values. But couldn't the columns "clear text" and "cipher values" be cached just as the "clear text" is cached on a select statement?
Based on our own testing, for field procedure based encryption, Db2 for i treats the output of the field procedure as deterministic and uses this to improve performance for decryption. However, the same performance benefit is not found when data is encrypted.
IBM does not intend to provide a solution to this request at this time, so it is being closed.
While we recognize that this enhancement would provide a performance boost, it is possible for a field procedure to be used for some purpose other than encryption/decryption. Optimization to eliminate some calls to field procedures could break applications that are using this support in some other way. For that reason, we cannot change the rules for when a field procedure is called.
In short, this enhancement could provide a drastic speed-up of Db2 encryption.
Attachment (Use case): image of the values within the table with 25 rows with field procedure programs