Red Hat Bugzilla – Bug 1303136
Cannot create a new resource with the same name of a one failed and deleted before, until cleanup
Last modified: 2018-03-23 07:22:53 EDT
Description of problem: If you remove a resource which has failed actions you can't create a new resource with the same name until you do a cleanup. Version-Release number of selected component (if applicable): pacemaker-1.1.13-10.el7.x86_64 pcs-0.9.143-15.el7.x86_64 How reproducible: Always, at least from my tests. Steps to Reproduce: 1. Delete a resource which has got failed actions: # sudo pcs resource delete nova-compute-checkevacuate Removing Constraint - location-nova-compute-checkevacuate-clone Removing Constraint - order-openstack-nova-conductor-clone-nova-compute-checkevacuate-clone-mandatory Removing Constraint - order-nova-compute-checkevacuate-clone-nova-compute-clone-mandatory Deleting Resource - nova-compute-checkevacuate Removal is successful: 2. Try to create a resource with the same name: # source ./overcloudrc; sudo pcs resource create nova-compute-checkevacuate ocf:openstack:nova-compute-wait auth_url=$OS_AUTH_URL username=$OS_USERNAME password=$OS_PASSWORD tenant_name=$OS_TENANT_NAME domain=localdomain no_shared_storage=1 op start timeout=300 --clone interleave=true --disabled --force Error: unable to create resource/fence device 'nova-compute-checkevacuate', 'nova-compute-checkevacuate' already exists on this system 3. Try to delete the (nonexistent) resource: # sudo pcs resource delete nova-compute-checkevacuate Error: Resource 'nova-compute-checkevacuate' does not exist. Actual results: Error: unable to create resource/fence device 'nova-compute-checkevacuate', 'nova-compute-checkevacuate' already exists on this system Expected results: Resource is successfully created. Additional info: As a workaround things gets fixed if you clean up everything.
See also this github issue: https://github.com/feist/pcs/issues/78
It seems I meet the same problem in my cluster. Before the new version established,shall I use “crm_resource -C ” to avoid the problem? [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group Error: unable to create resource/fence device 'nas_samba', 'nas_samba' already exists on this system [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node -R nas_samba The supplied command is considered dangerous. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node --force -R nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group Error: unable to create resource/fence device 'nas_samba', 'nas_samba' already exists on this system [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node --force -R nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs config | grep nas_samba [root@nas-210 ~]# [root@nas-210 ~]# pcs resource delete nas_samba Error: Resource 'nas_samba' does not exist. [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_resource -C nas_samba Waiting for 1 replies from the CRMd. OK [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs config | grep nas_samba Resource: nas_samba (class=systemd type=smb) Operations: monitor interval=15s start-delay=10s timeout=20s (nas_samba-monitor-interval-15s) [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]#
(In reply to Zhaoming Zhang from comment #3) > It seems I meet the same problem in my cluster. > Before the new version established,shall I use “crm_resource -C ” to avoid > the problem? Yes, that should do the trick. Alternatively you can use "pcs resource cleanup" command which runs "crm_resource -C" for you. "crm_node --force -R" command is for a case when a node (not a resource) has been removed and cannot be added back because there are still traces of it in pacemaker.
(In reply to Tomas Jelinek from comment #4) > (In reply to Zhaoming Zhang from comment #3) > > It seems I meet the same problem in my cluster. > > Before the new version established,shall I use “crm_resource -C ” to avoid > > the problem? > > Yes, that should do the trick. Alternatively you can use "pcs resource > cleanup" command which runs "crm_resource -C" for you. > > "crm_node --force -R" command is for a case when a node (not a resource) has > been removed and cannot be added back because there are still traces of it > in pacemaker. Thanks a lot! I meet the problem in a case like this: 1. In a two nodes cluster, eg. node0 and node1, I run "pcs cluster standy node1" and then run "poweroff" on node1. 2. Then I delete a resource and try to add the same name resource back. Coincidentally, I meet the problem again. 3. Automatically,I use “/usr/sbin/cibadmin -l -Q”to check the traces and find the traces. And I use "crm_resource -C " to do the trick, but it won't work!!! I try "crm_resource -C " several times, still won't work. 4. Then I turn on node1. After node1 runned the cluster, the traces disappeared! It just gone without a new command! Would u please tell me what's the reasons of the command won't work in step3 and the traces disappeared in step4? Any info would be helpful, thanks! [root@nas-220 ~]# pcs resource delete nas_nfs Error: Resource 'nas_nfs' does not exist. [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs <lrm_resource id="nas_nfs" type="nfsserver" class="ocf" provider="heartbeat"> <lrm_rsc_op id="nas_nfs_last_failure_0" operation_key="nas_nfs_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1210" rc-code="0" op-status="0" interval="0" last-run="1467102672" last-rc-change="1467102672" exec-time="370" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_last_0" operation_key="nas_nfs_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1315" rc-code="0" op-status="0" interval="0" last-run="1467167131" last-rc-change="1467167131" exec-time="430" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_monitor_15000" operation_key="nas_nfs_monitor_15000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1227" rc-code="0" op-status="0" interval="15000" last-rc-change="1467102682" exec-time="386" queue-time="10000" op-digest="cf9065dcbe3d8e10c2e27af5e9996ae4"/> [root@nas-220 ~]# crm_resource -C nas_nfs Waiting for 1 replies from the CRMd. OK [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs <lrm_resource id="nas_nfs" type="nfsserver" class="ocf" provider="heartbeat"> <lrm_rsc_op id="nas_nfs_last_failure_0" operation_key="nas_nfs_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1210" rc-code="0" op-status="0" interval="0" last-run="1467102672" last-rc-change="1467102672" exec-time="370" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_last_0" operation_key="nas_nfs_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1315" rc-code="0" op-status="0" interval="0" last-run="1467167131" last-rc-change="1467167131" exec-time="430" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_monitor_15000" operation_key="nas_nfs_monitor_15000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1227" rc-code="0" op-status="0" interval="15000" last-rc-change="1467102682" exec-time="386" queue-time="10000" op-digest="cf9065dcbe3d8e10c2e27af5e9996ae4"/> [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs [root@nas-220 ~]# [root@nas-220 ~]#
Created attachment 1183868 [details] proposed fix Setup: [root@rh72-node1:~]# pcs resource create d1 dummy [root@rh72-node1:~]# crm_resource -F -r d1 -H rh72-node1 Waiting for 1 replies from the CRMd. OK [root@rh72-node1:~]# crm_resource -F -r d1 -H rh72-node2 Waiting for 1 replies from the CRMd. OK [root@rh72-node1:~]# pcs cluster standby rh72-node2 [root@rh72-node1:~]# pcs resource delete d1 Attempting to stop: d1...Stopped Before fix: [root@rh72-node1:~]# pcs resource create d1 dummy Error: unable to create resource/fence device 'd1', 'd1' already exists on this system After fix: [root@rh72-node1:~]# pcs resource create d1 dummy [root@rh72-node1:~]# pcs resource show d1 Resource: d1 (class=ocf provider=heartbeat type=Dummy) Operations: start interval=0s timeout=20 (d1-start-interval-0s) stop interval=0s timeout=20 (d1-stop-interval-0s) monitor interval=10 timeout=20 (d1-monitor-interval-10)
(In reply to Zhaoming Zhang from comment #5) > > I meet the problem in a case like this: > 1. In a two nodes cluster, eg. node0 and node1, I run "pcs cluster standy > node1" and then run "poweroff" on node1. > 2. Then I delete a resource and try to add the same name resource back. > Coincidentally, I meet the problem again. > 3. Automatically,I use “/usr/sbin/cibadmin -l -Q”to check the traces and > find the traces. And I use "crm_resource -C " to do the trick, but it won't > work!!! > I try "crm_resource -C " several times, still won't work. > 4. Then I turn on node1. After node1 runned the cluster, the traces > disappeared! It just gone without a new command! > > > Would u please tell me what's the reasons of the command won't work in > step3 and the traces disappeared in step4? > Thank you for this additional report, it was very helpful. Apparently pacemaker does not update status of offline and standby nodes. When you brought the node back online, its status got updated and that is why the resource traces disappeared automatically. With the patch from comment6 pcs does not care about these traces any more and allows you to recreate the resource.
Setup: [vm-rhel72-1 ~] $ pcs resource create d1 dummy [vm-rhel72-1 ~] $ crm_resource -F -r d1 -H vm-rhel72-1 Waiting for 1 replies from the CRMd. OK [vm-rhel72-1 ~] $ crm_resource -F -r d1 -H vm-rhel72-3 Waiting for 1 replies from the CRMd. OK [vm-rhel72-1 ~] $ pcs cluster standby vm-rhel72-3 [vm-rhel72-1 ~] $ pcs resource delete d1 Attempting to stop: d1...Stopped Before Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-4.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource create d1 dummy Error: unable to create resource/fence device 'd1', 'd1' already exists on this system After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-5.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource create d1 dummy [vm-rhel72-1 ~] $ pcs resource d1 (ocf::heartbeat:Dummy): Started vm-rhel72-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2596.html