Description of problem:
From bug 1600640 we worked out that fence_zvmip can take too long to output the metadata, which causes pacemaker to turn off nodes instead of restarting them because it doesn't think the fence agent supports 'reboot'.
Version-Release number of selected component (if applicable):
QE has only hit it once.
Steps to Reproduce:
1. Run QE's revolver tests on s390x
Jul 11 12:12:42  qe-c01-m01.s390.bos.redhat.com stonith-ng: info: init_cib_cache_cb: Updating device list from the cib: init
Jul 11 12:12:48  qe-c01-m01.s390.bos.redhat.com stonith-ng: warning: stonith__rhcs_metadata: Could not execute metadata action for fence_zvmip: Software caused connection abort | rc=-103
A fenced node is subsequently shut off and not restarted.
The 'metadata' operation runs within the time limit and fencing works as normal.
See bug 1600640 for further discussion.
(In reply to Oyvind Albrigtsen from comment #2)
> kwenninger told me it seems to be hardcoded in "fenced_commands.c".
The pacemaker side of this has been covered in bug 1600640. This bug is to ascertain why fence_zvmip is taking so much time to print out some static data, and fix that issue.
Oh. I didnt see anything that should make it slow, so I figured it was either running too high load, so maybe a change of the timeout value or similar.
I'll look into it.
Hi Oyvind, has there been any progress on this bz?