Bug 1646346
| Summary: | Rebooting controller nodes hangs for 20 minutes when running 'pcs cluster stop' [rhel-7.5.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Oneata Mircea Teodor <toneata> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.5 | CC: | abeekhof, aherr, cfeist, cluster-maint, ctowsley, dbecker, dciabrin, jeckersb, jpokorny, kgaillot, mburns, mcornea, mkrcmari, morazi, yprokule |
| Target Milestone: | rc | Keywords: | Triaged, ZStream |
| Target Release: | 7.5 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.18-11.el7_5.4 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Pacemaker now verifies a node has resource information for a resource before synthesizing an internal result for an action on that resource.
Consequence: This uncovered a pre-existing bug where clone notify actions on a Pacemaker Remote node would be routed through the wrong cluster node when the Pacemaker Remote connection is moving, causing the cluster to indefinitely loop trying to move the connection.
Fix: Notify actions are now routed through the proper cluster node when a Pacemaker Remote connection is moving.
Result: Notify actions are completed successfully, and the Pacemaker Remote connection is able to move successfully.
|
Story Points: | --- |
| Clone Of: | 1644076 | Environment: | |
| Last Closed: | 2018-12-18 14:46:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1644076 | ||
| Bug Blocks: | |||
|
Description
Oneata Mircea Teodor
2018-11-05 12:34:53 UTC
This has been fixed in the upstream 1.1 branch by commit 2ba3fffc QA: To reproduce, configure a cluster with at least three cluster nodes, and a guest node and/or bundle. Run "pcs cluster stop" on one of the cluster nodes that is running the guest and/or bundle. (I expect the problem would also appear if you ban the guest node and/or bundle from one of the cluster nodes running it.) Before the fix, the operation will not complete, and the node will eventually force-exit after a timeout. After the fix, shutdown proceeds normally. Verified $ rpm -qa | grep pacemaker- pacemaker-cluster-libs-1.1.18-11.el7_5.4.x86_64 pacemaker-cli-1.1.18-11.el7_5.4.x86_64 pacemaker-libs-1.1.18-11.el7_5.4.x86_64 pacemaker-1.1.18-11.el7_5.4.x86_64 ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch puppet-pacemaker-0.7.2-0.20180423212253.el7ost.noarch pacemaker-remote-1.1.18-11.el7_5.4.x86_64 [heat-admin@controller-0 ~]$ time sudo pcs cluster stop Stopping Cluster (pacemaker)... ^[[OStopping Cluster (corosync)... real 0m42.528s user 0m0.275s sys 0m0.128s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3844 |