CHGASPACT: new functionality (*BCD) i.e. best can do. Stay in SUSPEND state even if not everything could be suspended.

Hello IBM,

I recently discussed options of CHGASPACT with IBM support (CASE TS014216210).

We normally use

chgaspact aspdev(*sysbas) option(*suspend) ssptimo(60) ssptimoacn(*END)

In case, this is not successful, *CONT is not an option for us.

We need an additional option like (*BCD) --> best can do

The ideas is:
After SSPTIMO is expired without success, remain in SUSPEND state (best can do) with flushed memory and handover control back to the application.
There we could start the FlashCopy Consistency Group on FSxxx - system and the do the chgaspact(*RESUME).

As IBM support told me, this option is already part of the "Full System Flashcopy toolkit".

Best regards from Germany

Jürgen

Idea priority

High

Post comment

Guest

Reply
| Nov 21, 2023

After closely analyzing and discussing your request, we believe there has been a misunderstanding regarding the suspend timeout actions, as well as the "safety valve" timer. The helptext and documentation is a bit confusing, and we feel the timeout error messages are unclear as well. Your proposed SSPTIMOACN(*BCD) "Best we Can Do" option is actually EXACTLY what the current *CONT option provides. As such, in next release we will update the documentation and error messages to describe the usage of these options better.

The Suspend Timeout actions give the user control of what should occur in the event that we cannot reach a suspended state within the given timeout period.

CHGASPACT OPTION(*SUSPEND) will perform the following steps. Note that suspending ASP activity ONLY halts database activity within the ASP and other, non-database-related reads and writes can still occur. As such, ALL *SUSPEND activity should be considered a "Best We Can Do" state. The only truly quiesced state would be if the IASP were fully varied off (or the system powered down in the case of suspending *SYSBAS)

1. Force Write. This scans every page in memory, searching for in-flight data that has not yet been written to the ASP and flushes it to disk. This is a "freebie" operation and the timeout parameter is not applied, since nothing has actually been suspended yet and all disk activity is still ongoing.

2. Suspend Transactions. This halts the initiation of all new commitment control transactions and waits for up to the suspend timeout value for current transactions to reach commit boundaries. This is the step that is most likely to exceed the suspend timeout value.

3. Suspend non-transaction-based database operations. This halts the initiation of new DB operations outside of commitment control. Non-transaction-based DB operations are quick and we wait for up to 10 seconds for the existing operations to complete.

4. Force Write. We do a 2nd scan of memory and flush any outstanding/in-flight memory to disk in order to get all of the data that had been changed while steps 2 & 3 were running. Essentially we are getting all the updates that have occurred while we were suspending activity.

At this point, the ASP is Suspended. No database activity should be ongoing within the ASP. Now, this is NOT a perfect "all data has been flushed and no activity at all is occurring" state. There are non-database reads and writes that can still occur within the ASP. This means the ASP is really in a "Best We Can Do" state of suspension here.

Now, what happens if we cannot reach this "Best We Can Do" state where database activity is halted within the suspend timeout period? The Suspend timeout action parameter takes effect. There are currently two options: *END and *CONT.

SSPTIMOACN(*END) -- this option will wait for the suspend timeout value for the ASP to fully suspend. If we cannot suspend all transactions or operations, we halt the suspend process and automatically issue a *RESUME -- effectively "undoing" the Suspend and leaving the ASP in a normal, read/write state.

SSPTIMOAC(*CONT) -- this option is equivalent to your proposed *BCD "Best we can do" option already. If we cannot suspend all transactions/operations within the given timeout period, we accept that all suspends are considered a "best we can do" suspend and continue processing the suspend. We move on to the next step (suspending operations or forcing writes to disk) and leave the ASP is an "as close to suspended as we could get it" state. At this point control returns to the user, so they can take a flashcopy of their ASP, perform a Save, etc. as they wish.

Once the activity is completed, the user should issue a CHGASPACT(*RESUME) in order to allow database activity (transactions and operations) to write to disk again, thus freeing up all jobs that were attempting to access the ASP and allowing them to continue running.

We have a 10-minute "safety valve" timer that will automatically resume database activity if a manual *RESUME is not issued. This timer should not be confused with the SSPTIMO(xx) parameter. The suspend timeout parameter defines how long to wait for existing transactions and operations to complete (how long to wait for the suspend to occur). The safety valve timer is used to automatically allow database writes again AFTER the suspend has completed.

This is especially important if the job performing the CHGASPACT *SUSPEND happened to attempt a database write while in the suspended state. If that were to occur, that job would also "hang" waiting for the write to complete and therefore would be unable to perform the subsequent *RESUME in order to allow activity to continue again. Essentially it's a way of releasing the system if all sessions happen to hang due to the *SUSPEND.

IBM Power Systems development

0 reply Hide replies

Admin

Carmelita Ruvalcaba

Reply
| Oct 24, 2023

The CAAC has reviewed this IBM Idea and recommends that IBM view this as a “nice to have” low priority feature.
This has benefit to shops running PowerHA and may also benefits Flashcopy shops.
Background: The COMMON Americas Advisory Council (CAAC) members have a broad range of experience in working with small and medium-sized IBM i customers. CAAC has a key role in working with IBM i development to help assess the value and impact of individual IBM Ideas on the broader IBM i community and has therefore reviewed your Idea.

For more information about CAAC, see www.common.org/caac
Carmelita Ruvalcaba- CAAC Program Manager

0 reply Hide replies

Admin

Sabine Jordan

Reply
| Oct 18, 2023

CEAC has discussed this idea and thinks that the desired behaviour can already be achieved as the system should post a message when the timeout has been reached without reaching suspended status.
Background: The COMMON Europe Advisory Council (CEAC) members have a broad range of experience in working with small and medium-sized IBM i customers. CEAC has a crucial role in working with IBM i development to help assess the value and impact of individual RFEs on the broader IBM i community and has therefore reviewed your RFE.

To find out how CEAC help to shape the future of IBM i, see CEAC @ ibm.biz/BdYSYj and the article "The Five Hottest IBM i RFEs Of The Quarter" at ibm.biz/BdYSZT

Sabine Jordan + Sara Andres – CEAC Program Manager, IBM

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

CHGASPACT: new functionality (*BCD) i.e. best can do. Stay in SUSPEND state even if not everything could be suspended.

Please enter your email address

RELATED IDEAS

CHGASPACT: new functionality (*BCD) i.e. best can do. Stay in SUSPEND state even if not everything could be suspended.