Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1329472 - Cannot recreate remote node resource
Cannot recreate remote node resource
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs (Show other bugs)
7.2
Unspecified Unspecified
low Severity medium
: rc
: ---
Assigned To: Tomas Jelinek
cluster-qe@redhat.com
:
Depends On: 1303136
Blocks:
  Show dependency treegraph
 
Reported: 2016-04-21 23:07 EDT by Andrew Beekhof
Modified: 2018-03-23 07:27 EDT (History)
8 users (show)

See Also:
Fixed In Version: pcs-0.9.152-5.el7
Doc Type: Bug Fix
Doc Text:
Cause: User removes a remote node from a cluster. Consequence: Pcs does not tell pacemaker the node is permanently gone and should be removed from pacemaker's internal structures. Pcs then refuses to create a resource or a remote node with the same name saying it already exists. Fix: Tell pacemaker the node was removed from the cluster. Result: It is possible to recreate the resource / remote node.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 16:58:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed fix (10.52 KB, patch)
2016-07-25 12:23 EDT, Tomas Jelinek
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2596 normal SHIPPED_LIVE Moderate: pcs security, bug fix, and enhancement update 2016-11-03 08:11:34 EDT

  None (edit)
Description Andrew Beekhof 2016-04-21 23:07:34 EDT
Description of problem:

Cannot recreate remote node resource when there is an existing node entry (left over from a previous incarnation)

[root@overcloud-controller-0 heat-admin]# pcs resource disable overcloud-novacompute-2
[root@overcloud-controller-0 heat-admin]# pcs resource delete overcloud-novacompute-2
Attempting to stop: overcloud-novacompute-2...Stopped
[root@overcloud-controller-0 heat-admin]# pcs resource create overcloud-novacompute-2 remote reconnect_interval=240
Error: unable to create resource/fence device 'overcloud-novacompute-2', 'overcloud-novacompute-2' already exists on this system
[root@overcloud-controller-0 heat-admin]# cibadmin -Ql | grep -C 10 overcloud-novacompute-2
      <node id="overcloud-novacompute-0" type="remote" uname="overcloud-novacompute-0">
        <instance_attributes id="nodes-overcloud-novacompute-0">
          <nvpair id="nodes-overcloud-novacompute-0-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
      <node id="overcloud-novacompute-1" type="remote" uname="overcloud-novacompute-1">
        <instance_attributes id="nodes-overcloud-novacompute-1">
          <nvpair id="nodes-overcloud-novacompute-1-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
      <node type="remote" id="overcloud-novacompute-2" uname="overcloud-novacompute-2">
        <instance_attributes id="nodes-overcloud-novacompute-2">
          <nvpair id="nodes-overcloud-novacompute-2-osprole" name="osprole" value="compute"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive class="ocf" id="ip-192.0.2.6" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="ip-192.0.2.6-instance_attributes">
          <nvpair id="ip-192.0.2.6-instance_attributes-ip" name="ip" value="192.0.2.6"/>
          <nvpair id="ip-192.0.2.6-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/>
        </instance_attributes>
        <operations>


Version-Release number of selected component (if applicable):

pcs-0.9.143-15.el7.x86_64

How reproducible:

100%

Steps to Reproduce:
1.  see above
2.
3.

Actual results:

Error: unable to create resource/fence device 'overcloud-novacompute-2', 'overcloud-novacompute-2' already exists on this system


Expected results:

resource is created

Additional info:

I think pcs' uniqueness checks are being slightly overzealous here.
The operation should be allowed to proceed
Comment 1 Andrew Beekhof 2016-04-21 23:12:55 EDT
Weirder... it seems to be happening at the cib level:

[root@overcloud-controller-0 heat-admin]# cibadmin --create -o resources --xml-text "     <primitive class="ocf" id="overcloud-novacompute-2.localdomain" provider="pacemaker" type="remote">
>         <instance_attributes id="overcloud-novacompute-2-instance_attributes">
>           <nvpair id="overcloud-novacompute-2-instance_attributes-reconnect_interval" name="reconnect_interval" value="240"/>
>         </instance_attributes>
>         <operations>
>           <op id="overcloud-novacompute-2-start-interval-0s" interval="0s" name="start" timeout="60"/>
>           <op id="overcloud-novacompute-2-stop-interval-0s" interval="0s" name="stop" timeout="60"/>
>           <op id="overcloud-novacompute-2-monitor-interval-20" interval="20" name="monitor"/>
>         </operations>
>       </primitive>
> "
Call cib_create failed (-76): Name not unique on network
<failed>
  <failed_update object_type="primitive" operation="cib_create" reason="Name not unique on network">
    <primitive/>
  </failed_update>
</failed>
[root@overcloud-controller-0 heat-admin]# cibadmin -Ql | grep -C 0 overcloud-novacompute-2
      <node type="remote" id="overcloud-novacompute-2" uname="overcloud-novacompute-2">
        <instance_attributes id="nodes-overcloud-novacompute-2">
          <nvpair id="nodes-overcloud-novacompute-2-osprole" name="osprole" value="compute"/>
--
    <node_state remote_node="true" id="overcloud-novacompute-2" uname="overcloud-novacompute-2" crm-debug-origin="do_update_resource" node_fenced="0">
      <transient_attributes id="overcloud-novacompute-2">
        <instance_attributes id="status-overcloud-novacompute-2"/>
--
      <lrm id="overcloud-novacompute-2">

Re-assigning
Comment 3 Tomas Jelinek 2016-04-22 03:05:31 EDT
This may be related to the fact pcs does not run crm_node -R when removing remote nodes: https://github.com/feist/pcs/issues/78

We also need to fix id uniqueness checks in pcs, as currently we search for an id in the whole cib including the status section, which is wrong: bz1303136
Maybe pcs should not search for existing id in nodes section as well? Let me know, thanks.
Comment 4 Ken Gaillot 2016-04-22 11:44:30 EDT
Agreed, pcs should do "crm_node -R" when removing a node, and that should fix this issue.

As far as what pcs should be looking at for name collisions, there are three places Pacemaker Remote nodes can show up:

1. The nodes section: not reliable, because they will have an entry here only if they have ever had a permanent node attribute set.

2. The status section: mostly reliable. They will have an entry here as long as they have ever been started.

3. The resources section: mostly reliable. You can check against the ID of any ocf:pacemaker:remote primitives configured, and the value of the remote-node attribute for any resource configured (i.e. guest nodes, usually for VirtualDomain resources, but could be any resource in theory). The only time this is not reliable is the situation described in this bz, i.e. they have been removed from the configuration but an old status entry is still present.

Bottom line, you could get away with just #2 or #3, but to be completely safe, check all three.

Pacemaker is correct in rejecting the addition in this case, because the old state info would cause problems if the same ID were reused. You could argue that pacemaker should automatically clear the state info when the node is removed from the configuration, so we should evaluate that possibility at some point.
Comment 6 Tomas Jelinek 2016-07-25 12:23 EDT
Created attachment 1183870 [details]
proposed fix

Test 1: pacemaker remote resource
[root@rh72-node1:~]# pcs resource create rh72-node3 remote
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource delete rh72-node3
Attempting to stop: rh72-node3...Stopped
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 2: remote-node attribute, remote-node remove
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs cluster remote-node remove rh72-node3
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 3: remote-node attribute, resource update
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource update anode meta remote-node=
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 4: remote-node attribute, resource meta
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource meta anode remote-node=
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0

Test 5: remote-node attribute, resource delete
[root@rh72-node1:~]# pcs resource create anode dummy
[root@rh72-node1:~]# pcs cluster remote-node add rh72-node3 anode
# force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command
[root@rh72-node1:~]# pcs node attribute rh72-node3 foo=bar
[root@rh72-node1:~]# pcs resource delete anode
Attempting to stop: anode...Stopped
# before fix this failed
[root@rh72-node1:~]# pcs resource create rh72-node3 dummy
[root@rh72-node1:~]# echo $?
0
Comment 7 Ivan Devat 2016-07-28 14:01:16 EDT
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
# is there to force pacemaker to create a record in nodes section
# the fix needs to work both with and without this command

1)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 remote
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete vm-rhel72-2
Deleting Resource - vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

2)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs cluster remote-node remove vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


3)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $  pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource update anode meta remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


4)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource meta anode remote-node=
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0


5)
Before Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-4.el7.x86_64
[vm-rhel72-1 ~] $  pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
Error: unable to create resource/fence device 'vm-rhel72-2', 'vm-rhel72-2' already exists on this system

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.152-5.el7.x86_64

a)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0

b)
[vm-rhel72-1 ~] $ pcs resource create anode dummy
[vm-rhel72-1 ~] $ pcs cluster remote-node add vm-rhel72-2 anode
[vm-rhel72-1 ~] $ pcs node standby vm-rhel72-2 && pcs node unstandby vm-rhel72-2
[vm-rhel72-1 ~] $ pcs resource delete anode
Deleting Resource - anode
[vm-rhel72-1 ~] $ pcs resource create vm-rhel72-2 dummy
[vm-rhel72-1 ~] $ echo $?
0
Comment 14 errata-xmlrpc 2016-11-03 16:58:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2596.html

Note You need to log in before you can comment on or make changes to this bug.