Bug 1601820 - fence_zvmip 'metadata' function times out (s390x)
Summary: fence_zvmip 'metadata' function times out (s390x)
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fence-agents
Version: 7.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
Depends On:
TreeView+ depends on / blocked
Reported: 2018-07-17 09:58 UTC by Andrew Price
Modified: 2020-01-15 02:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

Description Andrew Price 2018-07-17 09:58:51 UTC
Description of problem:

From bug 1600640 we worked out that fence_zvmip can take too long to output the metadata, which causes pacemaker to turn off nodes instead of restarting them because it doesn't think the fence agent supports 'reboot'.

Version-Release number of selected component (if applicable):


How reproducible:

QE has only hit it once.

Steps to Reproduce:
1. Run QE's revolver tests on s390x

Actual results:

Jul 11 12:12:42 [1514] qe-c01-m01.s390.bos.redhat.com stonith-ng:     info: init_cib_cache_cb:  Updating device list from the cib: init
Jul 11 12:12:48 [1514] qe-c01-m01.s390.bos.redhat.com stonith-ng:  warning: stonith__rhcs_metadata:     Could not execute metadata action for fence_zvmip: Software caused connection abort | rc=-103

A fenced node is subsequently shut off and not restarted.

Expected results:

The 'metadata' operation runs within the time limit and fencing works as normal.

Additional info:

See bug 1600640 for further discussion.

Comment 4 Andrew Price 2018-07-17 11:15:53 UTC
(In reply to Oyvind Albrigtsen from comment #2)
> kwenninger told me it seems to be hardcoded in "fenced_commands.c".

The pacemaker side of this has been covered in bug 1600640. This bug is to ascertain why fence_zvmip is taking so much time to print out some static data, and fix that issue.

Comment 5 Oyvind Albrigtsen 2018-07-17 11:28:44 UTC
Oh. I didnt see anything that should make it slow, so I figured it was either running too high load, so maybe a change of the timeout value or similar.

I'll look into it.

Comment 6 Andrew Price 2018-09-12 14:44:14 UTC
Hi Oyvind, has there been any progress on this bz?

Note You need to log in before you can comment on or make changes to this bug.