1329472 – Cannot recreate remote node resource

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1329472 - Cannot recreate remote node resource

Summary: Cannot recreate remote node resource

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1303136
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-22 03:07 UTC by Andrew Beekhof
Modified:	2020-01-17 15:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:	pcs-0.9.152-5.el7
Doc Type:	Bug Fix
Doc Text:	Cause: User removes a remote node from a cluster. Consequence: Pcs does not tell pacemaker the node is permanently gone and should be removed from pacemaker's internal structures. Pcs then refuses to create a resource or a remote node with the same name saying it already exists. Fix: Tell pacemaker the node was removed from the cluster. Result: It is possible to recreate the resource / remote node.
Clone Of:
Environment:
Last Closed:	2016-11-03 20:58:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix (10.52 KB, patch) 2016-07-25 16:23 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1303136	0	low	CLOSED	Cannot create a new resource with the same name of a one failed and deleted before, until cleanup	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2016:2596	0	normal	SHIPPED_LIVE	Moderate: pcs security, bug fix, and enhancement update	2016-11-03 12:11:34 UTC

Internal Links: 1303136

Description Andrew Beekhof 2016-04-22 03:07:34 UTC

Description of problem:

Cannot recreate remote node resource when there is an existing node entry (left over from a previous incarnation)

[root@overcloud-controller-0 heat-admin]# pcs resource disable overcloud-novacompute-2
[root@overcloud-controller-0 heat-admin]# pcs resource delete overcloud-novacompute-2
Attempting to stop: overcloud-novacompute-2...Stopped
[root@overcloud-controller-0 heat-admin]# pcs resource create overcloud-novacompute-2 remote reconnect_interval=240
Error: unable to create resource/fence device 'overcloud-novacompute-2', 'overcloud-novacompute-2' already exists on this system
[root@overcloud-controller-0 heat-admin]# cibadmin -Ql | grep -C 10 overcloud-novacompute-2
      <node id="overcloud-novacompute-0" type="remote" uname="overcloud-novacompute-0">
        <instance_attributes id="nodes-overcloud-novacompute-0">
          <nvpair id="nodes-overcloud-novacompute-0-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
      <node id="overcloud-novacompute-1" type="remote" uname="overcloud-novacompute-1">
        <instance_attributes id="nodes-overcloud-novacompute-1">
          <nvpair id="nodes-overcloud-novacompute-1-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
      <node type="remote" id="overcloud-novacompute-2" uname="overcloud-novacompute-2">
        <instance_attributes id="nodes-overcloud-novacompute-2">
          <nvpair id="nodes-overcloud-novacompute-2-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive class="ocf" id="ip-192.0.2.6" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="ip-192.0.2.6-instance_attributes">
          <nvpair id="ip-192.0.2.6-instance_attributes-ip" name="ip" value="192.0.2.6"/>
          <nvpair id="ip-192.0.2.6-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/>
        </instance_attributes>
        <operations>


Version-Release number of selected component (if applicable):

pcs-0.9.143-15.el7.x86_64

How reproducible:

100%

Steps to Reproduce:
1.  see above
2.
3.

Actual results:

Error: unable to create resource/fence device 'overcloud-novacompute-2', 'overcloud-novacompute-2' already exists on this system


Expected results:

resource is created

Additional info:

I think pcs' uniqueness checks are being slightly overzealous here.
The operation should be allowed to proceed

Comment 1 Andrew Beekhof 2016-04-22 03:12:55 UTC

Weirder... it seems to be happening at the cib level:

[root@overcloud-controller-0 heat-admin]# cibadmin --create -o resources --xml-text "     <primitive class="ocf" id="overcloud-novacompute-2.localdomain" provider="pacemaker" type="remote">
>         <instance_attributes id="overcloud-novacompute-2-instance_attributes">
>           <nvpair id="overcloud-novacompute-2-instance_attributes-reconnect_interval" name="reconnect_interval" value="240"/>
>         </instance_attributes>
>         <operations>
>           <op id="overcloud-novacompute-2-start-interval-0s" interval="0s" name="start" timeout="60"/>
>           <op id="overcloud-novacompute-2-stop-interval-0s" interval="0s" name="stop" timeout="60"/>
>           <op id="overcloud-novacompute-2-monitor-interval-20" interval="20" name="monitor"/>
>         </operations>
>       </primitive>
> "
Call cib_create failed (-76): Name not unique on network
<failed>
  <failed_update object_type="primitive" operation="cib_create" reason="Name not unique on network">
    <primitive/>
  </failed_update>
</failed>
[root@overcloud-controller-0 heat-admin]# cibadmin -Ql | grep -C 0 overcloud-novacompute-2
      <node type="remote" id="overcloud-novacompute-2" uname="overcloud-novacompute-2">
        <instance_attributes id="nodes-overcloud-novacompute-2">
          <nvpair id="nodes-overcloud-novacompute-2-osprole" name="osprole" value="compute"/>
--
    <node_state remote_node="true" id="overcloud-novacompute-2" uname="overcloud-novacompute-2" crm-debug-origin="do_update_resource" node_fenced="0">
      <transient_attributes id="overcloud-novacompute-2">
        <instance_attributes id="status-overcloud-novacompute-2"/>
--
      <lrm id="overcloud-novacompute-2">

Re-assigning

Comment 3 Tomas Jelinek 2016-04-22 07:05:31 UTC

This may be related to the fact pcs does not run crm_node -R when removing remote nodes: https://github.com/feist/pcs/issues/78

We also need to fix id uniqueness checks in pcs, as currently we search for an id in the whole cib including the status section, which is wrong: bz1303136
Maybe pcs should not search for existing id in nodes section as well? Let me know, thanks.

Comment 4 Ken Gaillot 2016-04-22 15:44:30 UTC

Agreed, pcs should do "crm_node -R" when removing a node, and that should fix this issue.

As far as what pcs should be looking at for name collisions, there are three places Pacemaker Remote nodes can show up:

1. The nodes section: not reliable, because they will have an entry here only if they have ever had a permanent node attribute set.

2. The status section: mostly reliable. They will have an entry here as long as they have ever been started.

3. The resources section: mostly reliable. You can check against the ID of any ocf:pacemaker:remote primitives configured, and the value of the remote-node attribute for any resource configured (i.e. guest nodes, usually for VirtualDomain resources, but could be any resource in theory). The only time this is not reliable is the situation described in this bz, i.e. they have been removed from the configuration but an old status entry is still present.

Bottom line, you could get away with just #2 or #3, but to be completely safe, check all three.

Pacemaker is correct in rejecting the addition in this case, because the old state info would cause problems if the same ID were reused. You could argue that pacemaker should automatically clear the state info when the node is removed from the configuration, so we should evaluate that possibility at some point.

Comment 6 Tomas Jelinek 2016-07-25 16:23:01 UTC

Created attachment 1183870 [details]
proposed fix

Test 1: pacemaker remote resource
[root@rh72-node1:~]# pcs resource create rh72-node3 remote
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource delete rh72-node3
Attempting to stop: rh72-node3...Stopped
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 2: remote-node attribute, remote-node remove
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs cluster remote-node remove rh72-node3
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 3: remote-node attribute, resource update
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource update anode meta remote-node=
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 4: remote-node attribute, resource meta
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource meta anode remote-node=
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 5: remote-node attribute, resource delete
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource delete anode
Attempting to stop: anode...Stopped
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Comment 7 Ivan Devat 2016-07-28 18:01:16 UTC

[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
# is there to force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command

1)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

2)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


3)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $  pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


4)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


5)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $  pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

Comment 14 errata-xmlrpc 2016-11-03 20:58:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.