RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1193499 - member weirdness when adding/removing nodes
Summary: member weirdness when adding/removing nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pacemaker
Version: 6.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 6.8
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1162727
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-17 13:25 UTC by Radek Steiger
Modified: 2016-05-10 23:51 UTC (History)
8 users (show)

Fixed In Version: pacemaker-1.1.14-1.1.el6
Doc Type: Bug Fix
Doc Text:
Cause: Removed nodes were not consistently purged from all Pacemaker components' peer caches. Consequence: Removing and adding nodes can result in a node ID being recycled, which should be OK but caused daemon crashes due to conflicting information from the former node not being purged from the peer cache. Fix: Peer cache management has been overhauled so that the libcluster library handles node reaping itself rather than relying on the individual components to do it correctly. Result: Recycling node IDs should not cause any problems.
Clone Of: 1162727
Environment:
Last Closed: 2016-05-10 23:51:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0856 0 normal SHIPPED_LIVE pacemaker bug fix and enhancement update 2016-05-10 22:44:25 UTC

Description Radek Steiger 2015-02-17 13:25:44 UTC
+++ This bug was initially created as a clone of Bug #1162727 +++


> Description of problem:

Adding and removing nodes may cause an ID collision in RHEL6 pacemaker, just like in it's RHEL7 counterpart. When a newly added node is assigned with an ID that has been previously used by a different node, a collision occurs somewhere in pacemaker caches. I can see messages like these in logs:

Feb 17 13:40:40 virt-091 stonith-ng[6415]:  warning: crm_find_peer: Node 'virt-093' and 'virt-096' share the same cluster nodeid: 3
Feb 17 13:40:42 virt-091 attrd[6417]:  warning: crm_find_peer: Node 'virt-093' and 'virt-096' share the same cluster nodeid: 3
Feb 17 13:40:42 virt-091 cib[6414]:  warning: crm_find_peer: Node 'virt-093' and 'virt-096' share the same cluster nodeid: 3


Chronology o reproducer steps:

1) Tue Feb 17 13:39:34 CET 2015
pcs cluster node remove virt-096 && cman_tool version -r -S

2) Tue Feb 17 13:39:57 CET 2015
pcs cluster node remove virt-093 && cman_tool version -r -S

3) Tue Feb 17 13:40:28 CET 2015
pcs cluster node add virt-096 --start


The initial node ID distribution:

1    virt-091
2    virt-092
3    virt-093
4    virt-094
5    virt-095
6    virt-096

The final ID distribution after all additions/removals:

1    virt-091
2    virt-092
3    virt-096
4    virt-094
5    virt-095

Comment 1 Radek Steiger 2015-02-17 13:26:30 UTC
> Version-Release number of selected component (if applicable):

pacemaker-1.1.12-4.el6.x86_64
pcs-0.9.138-1.el6.x86_64
corosync-1.4.7-1.el6.x86_64
cman-3.0.12.1-68.el6.x86_64

Comment 4 Andrew Beekhof 2015-03-31 01:29:41 UTC
Bug #1162727 has the list of patches required here.

Comment 5 Andrew Beekhof 2015-04-10 01:22:07 UTC
Patches:

0eb41da: Fix: attrd: Remove offline nodes from node cache for "peer-remove" requests 
ba8d3cd: Fix: membership: Prevent use-after-free in reap_crm_member() 
68d5738: Fix: cluster: Remove unknown offline nodes with conflicting unames from node cache 
c97575b: Fix: crmd: Remove state of unknown nodes with conflicting unames from CIB 
50ffa21: Fix: crmd: Remove unknown nodes with conflicting unames from CIB 
ddccf97: Fix: Membership: Detect and resolve nodes that change their ID 
371e79c: Fix: attrd: Clean out the node cache when requested by the admin 
b658b2b: Fix: attrd: Simplify how node deletions happen 
bf15d36: Fix: cib: Avoid nodeid conflicts we don't care about 
30a1ba9: Fix: fencing: Allow nodes to be purged from the member cache 
c8b413f: Fix: crm_node: Correctly remove nodes from the CIB by nodeid 
0b98ef1: Fix: stonith-ng: Correctly track node state 
72b3a9a: Fix: stonith-ng: No reply is needed for CRM_OP_RM_NODE_CACHE 
e48a7a0: Fix: cib: Correctly track node state 
f51c05d: Fix: cluster: Invoke crm_remove_conflicting_peer() only when the new node's uname is being assigned in the node cache 

and the lib/cluster portion of:

8727a4f: Feature: Allow fail-counts to be removed en-mass when the new attrd is in operation

Comment 8 Ken Gaillot 2015-06-02 19:10:10 UTC
A fix for upstream is pending testing but will not make it in time for 6.7.

Comment 10 Ken Gaillot 2015-07-29 20:40:14 UTC
Fixed upstream as of commit 49fd91f.

Comment 18 errata-xmlrpc 2016-05-10 23:51:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0856.html


Note You need to log in before you can comment on or make changes to this bug.