Bug 1162727
Summary: | member weirdness when adding/removing nodes | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Radek Steiger <rsteiger> | |
Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.1 | CC: | cfeist, cluster-maint, cluster-qe, fdinitto, jherrman, jkortus, lmiksik, mjuricek, tlavigne | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-1.1.13-3.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, removing and adding cluster nodes in some cases caused conflicting node IDs, which in turn led to certain Pacemaker components terminating unexpectedly. This update overhauls peer cache management to ensure that recycling node IDs is handled correctly, and the described problem thus no longer occurs.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1193499 1244101 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-19 12:12:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1193499, 1244101 |
Description
Radek Steiger
2014-11-11 15:02:03 UTC
Patch for this is: https://github.com/beekhof/pacemaker/commit/ddccf97 Forgot to update bugzilla with build information Guilty. The patch https://github.com/ClusterLabs/pacemaker/commit/bf15d36 assumed that 'node->state' was hooked up for the cib. This seems not to be the case, so I'm testing a patch that would address that part so that the original patch will work. Apologies. This looks better: Mar 27 16:26:10 [27091] pcmk-1 cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node pcmk-3[103] - corosync-cpg is now offline Mar 27 16:26:10 [27091] pcmk-1 cib: info: cib_peer_update_callback: 2 103 member Mar 27 16:26:10 [27091] pcmk-1 cib: notice: crm_update_peer_state: cib_peer_update_callback: Node pcmk-3[103] - state is now lost (was member) Mar 27 16:26:10 [27091] pcmk-1 cib: info: cib_peer_update_callback: 1 103 lost Mar 27 16:26:10 [27091] pcmk-1 cib: notice: crm_reap_dead_member: Removing pcmk-3/103 from the membership list Mar 27 16:26:10 [27091] pcmk-1 cib: notice: reap_crm_member: Purged 1 peers with id=103 and/or uname=(null) from the membership cache The node dropping out from CPG is now automatically removing the node from the cache - so there is nothing for the new node to conflict with. Also needed: + 0b98ef1: Fix: stonith-ng: Correctly track node state (HEAD, 1.1) + 72b3a9a: Fix: stonith-ng: No reply is needed for CRM_OP_RM_NODE_CACHE Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2383.html |