Bug 1303136
Summary: | Cannot create a new resource with the same name of a one failed and deleted before, until cleanup | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Raoul Scarazzini <rscarazz> | ||||
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 7.2 | CC: | abeekhof, cfeist, cluster-maint, fdinitto, idevat, michele, omular, rmarigny, rsteiger, skinjo, tojeline | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | pcs-0.9.152-5.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause:
User deletes a resource from a cluster.
Consequence:
Sometimes (depending on the cluster status and configuration) traces of the resource remain in the cluster and pcs then refuses to create a resource with the same name.
Fix:
Properly check if specified resource id really exists in the cluster.
Result:
It is possible to recreate the resource.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-03 20:56:42 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1329472 | ||||||
Attachments: |
|
Description
Raoul Scarazzini
2016-01-29 16:17:59 UTC
See also this github issue: https://github.com/feist/pcs/issues/78 It seems I meet the same problem in my cluster. Before the new version established,shall I use “crm_resource -C ” to avoid the problem? [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group Error: unable to create resource/fence device 'nas_samba', 'nas_samba' already exists on this system [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node -R nas_samba The supplied command is considered dangerous. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node --force -R nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group Error: unable to create resource/fence device 'nas_samba', 'nas_samba' already exists on this system [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_node --force -R nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba <lrm_resource id="nas_samba" type="smb" class="systemd"> <lrm_rsc_op id="nas_samba_last_0" operation_key="nas_samba_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:7;72:220:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node1" call-id="121" rc-code="7" op-status="0" interval="0" last-run="1466676416" last-rc-change="1466676416" exec-time="119" queue-time="1" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs config | grep nas_samba [root@nas-210 ~]# [root@nas-210 ~]# pcs resource delete nas_samba Error: Resource 'nas_samba' does not exist. [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# crm_resource -C nas_samba Waiting for 1 replies from the CRMd. OK [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# /usr/sbin/cibadmin -l -Q |grep nas_samba [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs resource create nas_samba systemd:smb op monitor start-delay=10s interval=15s timeout=20s --group nas_group [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# pcs config | grep nas_samba Resource: nas_samba (class=systemd type=smb) Operations: monitor interval=15s start-delay=10s timeout=20s (nas_samba-monitor-interval-15s) [root@nas-210 ~]# [root@nas-210 ~]# [root@nas-210 ~]# (In reply to Zhaoming Zhang from comment #3) > It seems I meet the same problem in my cluster. > Before the new version established,shall I use “crm_resource -C ” to avoid > the problem? Yes, that should do the trick. Alternatively you can use "pcs resource cleanup" command which runs "crm_resource -C" for you. "crm_node --force -R" command is for a case when a node (not a resource) has been removed and cannot be added back because there are still traces of it in pacemaker. (In reply to Tomas Jelinek from comment #4) > (In reply to Zhaoming Zhang from comment #3) > > It seems I meet the same problem in my cluster. > > Before the new version established,shall I use “crm_resource -C ” to avoid > > the problem? > > Yes, that should do the trick. Alternatively you can use "pcs resource > cleanup" command which runs "crm_resource -C" for you. > > "crm_node --force -R" command is for a case when a node (not a resource) has > been removed and cannot be added back because there are still traces of it > in pacemaker. Thanks a lot! I meet the problem in a case like this: 1. In a two nodes cluster, eg. node0 and node1, I run "pcs cluster standy node1" and then run "poweroff" on node1. 2. Then I delete a resource and try to add the same name resource back. Coincidentally, I meet the problem again. 3. Automatically,I use “/usr/sbin/cibadmin -l -Q”to check the traces and find the traces. And I use "crm_resource -C " to do the trick, but it won't work!!! I try "crm_resource -C " several times, still won't work. 4. Then I turn on node1. After node1 runned the cluster, the traces disappeared! It just gone without a new command! Would u please tell me what's the reasons of the command won't work in step3 and the traces disappeared in step4? Any info would be helpful, thanks! [root@nas-220 ~]# pcs resource delete nas_nfs Error: Resource 'nas_nfs' does not exist. [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs <lrm_resource id="nas_nfs" type="nfsserver" class="ocf" provider="heartbeat"> <lrm_rsc_op id="nas_nfs_last_failure_0" operation_key="nas_nfs_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1210" rc-code="0" op-status="0" interval="0" last-run="1467102672" last-rc-change="1467102672" exec-time="370" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_last_0" operation_key="nas_nfs_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1315" rc-code="0" op-status="0" interval="0" last-run="1467167131" last-rc-change="1467167131" exec-time="430" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_monitor_15000" operation_key="nas_nfs_monitor_15000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1227" rc-code="0" op-status="0" interval="15000" last-rc-change="1467102682" exec-time="386" queue-time="10000" op-digest="cf9065dcbe3d8e10c2e27af5e9996ae4"/> [root@nas-220 ~]# crm_resource -C nas_nfs Waiting for 1 replies from the CRMd. OK [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs <lrm_resource id="nas_nfs" type="nfsserver" class="ocf" provider="heartbeat"> <lrm_rsc_op id="nas_nfs_last_failure_0" operation_key="nas_nfs_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;24:745:7:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1210" rc-code="0" op-status="0" interval="0" last-run="1467102672" last-rc-change="1467102672" exec-time="370" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_last_0" operation_key="nas_nfs_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;128:832:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1315" rc-code="0" op-status="0" interval="0" last-run="1467167131" last-rc-change="1467167131" exec-time="430" queue-time="0" op-digest="8236642d60a6a43b6357038bd2cf15c7"/> <lrm_rsc_op id="nas_nfs_monitor_15000" operation_key="nas_nfs_monitor_15000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" transition-magic="0:0;96:746:0:94eec9d7-eafd-457c-93ec-cfe7a5e45232" on_node="node0" call-id="1227" rc-code="0" op-status="0" interval="15000" last-rc-change="1467102682" exec-time="386" queue-time="10000" op-digest="cf9065dcbe3d8e10c2e27af5e9996ae4"/> [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# [root@nas-220 ~]# /usr/sbin/cibadmin -l -Q| grep nas_nfs [root@nas-220 ~]# [root@nas-220 ~]# Created attachment 1183868 [details]
proposed fix
Setup:
[root@rh72-node1:~]# pcs resource create d1 dummy
[root@rh72-node1:~]# crm_resource -F -r d1 -H rh72-node1
Waiting for 1 replies from the CRMd. OK
[root@rh72-node1:~]# crm_resource -F -r d1 -H rh72-node2
Waiting for 1 replies from the CRMd. OK
[root@rh72-node1:~]# pcs cluster standby rh72-node2
[root@rh72-node1:~]# pcs resource delete d1
Attempting to stop: d1...Stopped
Before fix:
[root@rh72-node1:~]# pcs resource create d1 dummy
Error: unable to create resource/fence device 'd1', 'd1' already exists on this system
After fix:
[root@rh72-node1:~]# pcs resource create d1 dummy
[root@rh72-node1:~]# pcs resource show d1
Resource: d1 (class=ocf provider=heartbeat type=Dummy)
Operations: start interval=0s timeout=20 (d1-start-interval-0s)
stop interval=0s timeout=20 (d1-stop-interval-0s)
monitor interval=10 timeout=20 (d1-monitor-interval-10)
(In reply to Zhaoming Zhang from comment #5) > > I meet the problem in a case like this: > 1. In a two nodes cluster, eg. node0 and node1, I run "pcs cluster standy > node1" and then run "poweroff" on node1. > 2. Then I delete a resource and try to add the same name resource back. > Coincidentally, I meet the problem again. > 3. Automatically,I use “/usr/sbin/cibadmin -l -Q”to check the traces and > find the traces. And I use "crm_resource -C " to do the trick, but it won't > work!!! > I try "crm_resource -C " several times, still won't work. > 4. Then I turn on node1. After node1 runned the cluster, the traces > disappeared! It just gone without a new command! > > > Would u please tell me what's the reasons of the command won't work in > step3 and the traces disappeared in step4? > Thank you for this additional report, it was very helpful. Apparently pacemaker does not update status of offline and standby nodes. When you brought the node back online, its status got updated and that is why the resource traces disappeared automatically. With the patch from comment6 pcs does not care about these traces any more and allows you to recreate the resource. Setup: [vm-rhel72-1 ~] $ pcs resource create d1 dummy [vm-rhel72-1 ~] $ crm_resource -F -r d1 -H vm-rhel72-1 Waiting for 1 replies from the CRMd. OK [vm-rhel72-1 ~] $ crm_resource -F -r d1 -H vm-rhel72-3 Waiting for 1 replies from the CRMd. OK [vm-rhel72-1 ~] $ pcs cluster standby vm-rhel72-3 [vm-rhel72-1 ~] $ pcs resource delete d1 Attempting to stop: d1...Stopped Before Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-4.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource create d1 dummy Error: unable to create resource/fence device 'd1', 'd1' already exists on this system After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.152-5.el7.x86_64 [vm-rhel72-1 ~] $ pcs resource create d1 dummy [vm-rhel72-1 ~] $ pcs resource d1 (ocf::heartbeat:Dummy): Started vm-rhel72-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2596.html |