1601820 – fence_zvmip 'metadata' function times out (s390x)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1601820 - fence_zvmip 'metadata' function times out (s390x)

Summary: fence_zvmip 'metadata' function times out (s390x)

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	fence-agents
Sub Component:
Version:	7.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Oyvind Albrigtsen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-17 09:58 UTC by Andrew Price
Modified:	2020-06-04 10:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-04 10:38:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrew Price 2018-07-17 09:58:51 UTC

Description of problem:

From bug 1600640 we worked out that fence_zvmip can take too long to output the metadata, which causes pacemaker to turn off nodes instead of restarting them because it doesn't think the fence agent supports 'reboot'.

Version-Release number of selected component (if applicable):

fence-agents-zvm-4.2.1-1.el7.s390x

How reproducible:

QE has only hit it once.

Steps to Reproduce:
1. Run QE's revolver tests on s390x

Actual results:

Jul 11 12:12:42 [1514] qe-c01-m01.s390.bos.redhat.com stonith-ng:     info: init_cib_cache_cb:  Updating device list from the cib: init
...
Jul 11 12:12:48 [1514] qe-c01-m01.s390.bos.redhat.com stonith-ng:  warning: stonith__rhcs_metadata:     Could not execute metadata action for fence_zvmip: Software caused connection abort | rc=-103

A fenced node is subsequently shut off and not restarted.

Expected results:

The 'metadata' operation runs within the time limit and fencing works as normal.

Additional info:

See bug 1600640 for further discussion.

Comment 4 Andrew Price 2018-07-17 11:15:53 UTC

(In reply to Oyvind Albrigtsen from comment #2)
> kwenninger told me it seems to be hardcoded in "fenced_commands.c".

The pacemaker side of this has been covered in bug 1600640. This bug is to ascertain why fence_zvmip is taking so much time to print out some static data, and fix that issue.

Comment 5 Oyvind Albrigtsen 2018-07-17 11:28:44 UTC

Oh. I didnt see anything that should make it slow, so I figured it was either running too high load, so maybe a change of the timeout value or similar.

I'll look into it.

Comment 6 Andrew Price 2018-09-12 14:44:14 UTC

Hi Oyvind, has there been any progress on this bz?

Comment 8 Ken Gaillot 2020-06-03 21:29:04 UTC

The first thing I'd check is whether the agent does something like initialize some variables or run a validate function for every call. meta-data can be handled before all that

Comment 11 Oyvind Albrigtsen 2020-06-04 10:38:15 UTC

Reporter hasnt seen the issue in a long time.

Closing. Reopen the bz if this becomes an issue again.

Note You need to log in before you can comment on or make changes to this bug.