Bug 2125587
Summary: | During a rolling upgrade, monitor operations are not being communicated between nodes as expected. [rhel-8.6.0.z] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | RHEL Program Management Team <pgm-rhel-tools> |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 8.6 | CC: | cfeist, cluster-maint, jobaker, mjuricek, nwahl, sbradley |
Target Milestone: | rc | Keywords: | Regression, Triaged, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-2.1.2-4.el8_6.5 | Doc Type: | Bug Fix |
Doc Text: |
Cause: OCF resource agent metadata actions block the controller, and crm_node queries now perform controller requests.
Consequence: If an agent's metadata action calls crm_node, it will completely block the controller for 30 seconds until the action times out, possibly causing other actions to fail and the node to be fenced.
Fix: The controller now performs metadata actions asynchronously.
Result: Agent metadata actions can call crm_node without problems.
|
Story Points: | --- |
Clone Of: | 2121852 | Environment: | |
Last Closed: | 2022-12-06 09:54:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 2.1.5 |
Embargoed: | |||
Bug Depends On: | 2121852 | ||
Bug Blocks: |
Comment 1
Ken Gaillot
2022-09-20 19:51:25 UTC
Steps to reproduce are in https://bugzilla.redhat.com/show_bug.cgi?id=2121852#c12. Make sure to modify the Dummy resource agent on the same node that it will execute on (so, either have just a one node cluster or modify it on all nodes to be safe). Start the cluster and add the Dummy resource. You'll see in the logs that it takes a while before it times out and logs the error message, but the resource will still be created. Stop the cluster and start it again with the new resource. You'll see that it takes a long time to start up and that "crm_node -l" is in the process list. Update to the new packages. Start the cluster again and you'll see it start up normally. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8808 |