Bug 1172539

Summary: Node ends up in a reboot loop when a resource with the same name exists
Product: Red Hat Enterprise Linux 7 Reporter: Radek Steiger <rsteiger>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: cluster-maint, fdinitto, jkortus
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.13-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:12:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cib with dummy resource none

Description Radek Steiger 2014-12-10 10:30:24 UTC
Created attachment 966730 [details]
cib with dummy resource

> Description of problem:

When I create a resource which name is identical to a name of one of the cluster nodes, that node will end up in a reboot or fencing loop.

I can imagine it being a common situation where the admin wants to run for example Apache and calls both the cluster node and the resource 'apache'.


> Version-Release number of selected component (if applicable):
pacemaker-1.1.12-13.el7.x86_64
corosync-2.3.4-3.el7.x86_64
pcs-0.9.137-3.el7.x86_64

> How reproducible:
Always

> Steps to Reproduce:
1. get a cluster running
2. configure fencing
3. choose random node and create a dummy resource with the same name

> Actual results:
The chosen node ends up in reboot/fencing loop.

> Expected results:
No reboots.

> Additional info:
All necessary info should be contained in the attached cib.xml and te crm report.

Comment 3 Andrew Beekhof 2014-12-17 03:34:15 UTC
Yikes!  Ok, I'll look at this when I get back from the xmas break.

Comment 4 Andrew Beekhof 2015-01-23 00:52:30 UTC
Ran out of time for this, bumping to 7.2 since there is a viable work-around (dont do that! :-)

Comment 6 Andrew Beekhof 2015-03-31 01:27:55 UTC
David, I'd prefer you dug into this since it is tying in with the remote node functionality and I don't want to screw anything up :-)


[12:18 PM] beekhof@fedora ~/Development/sources/pacemaker/devel ☺ # tools/crm_simulate -Sx ~/Downloads/pe-error-1.bz2 -VVV

Current cluster status:
Online: [ virt-041 virt-042 virt-043 virt-044 ]

 Fencing	(stonith:fence_xvm):	Started virt-041 
 virt-044	(ocf::heartbeat:Dummy):	Started [ virt-042 virt-044 ]

  notice: check_rsc_parameters: 	Forcing restart of virt-044 on virt-044, type changed: remote -> Dummy
  notice: check_rsc_parameters: 	Forcing restart of virt-044 on virt-044, provider changed: pacemaker -> heartbeat

[...]

Comment 8 David Vossel 2015-06-05 19:34:15 UTC
patch posted
https://github.com/ClusterLabs/pacemaker/pull/726

This was pretty interesting to track down.

Unit test.

1. create a resource with the exact same name as a pacemaker node.
2. verify no unexpected fencing actions occur after the resource is created.

if the resource successfully starts somewhere without fencing and a bunch of errors/warnings showing up in the logs, then it works.

Comment 12 errata-xmlrpc 2015-11-19 12:12:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2383.html