Clappmond RAS enhancemenets

See this idea on ideas.ibm.com

Recently I worked on a number of calls where PHA application monitor timed out. Usually we can not provide a proper RCA in these cases which frustrates our customers.
I propose to add an option to application monitor / clappmond to collect additional debug data if the application monitor times out.

1. Run pdump.sh (https://www-01.ibm.com/support/docview.wss?uid=aixtools650ae3be) on clappmond and all of its child processes - this is useful when the actual monitor script or one of its child process hang.

2. Collect perfpmr data for 2-3 minutes.

In both cases PHA should collect the debug data before killing the monitor process and performing any further action (e.g. server restart or takeover).

Additional notes:
- This could be added into the existing cl_ffdc event script, though we may have to change how and when the daemon invokes it in order to get relevant data before any recovery.
- Using the existing event should also be enough to trigger an upstream event for the SMUI so we could add notifications in smui itself.
- Running anything for 2-3 minutes before doing recovery is not something we would want as default behavior, especially in a production environment, so we would have to come up with some approach for enabling this only as needed.
- It looks like both pdump.sh and perfpmr have to be downloaded separately – it would be nicer if they were shipped with base aix, but I suppose if the customer gets to the point where they are needed, that the downloading itself should not be a concern. We would also have to integrate the collection of data from these tools with cl_ffdc.
- Add some kind of simple locking (e.g. a lock file) to avoid running multiple perfpmrs at the same time

Idea priority

High

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

Clappmond RAS enhancemenets

Please enter your email address

RELATED IDEAS

Clappmond RAS enhancemenets