Bug 1172539

Summary:

Node ends up in a reboot loop when a resource with the same name exists

Product:

Red Hat Enterprise Linux 7

Reporter:

Radek Steiger <rsteiger>

Component:

pacemaker

Assignee:

Andrew Beekhof <abeekhof>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

7.1

CC:

cluster-maint, fdinitto, jkortus

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

pacemaker-1.1.13-3.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-11-19 12:12:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cib with dummy resource	none

Description Radek Steiger 2014-12-10 10:30:24 UTC

Created attachment 966730 [details]
cib with dummy resource

> Description of problem:

When I create a resource which name is identical to a name of one of the cluster nodes, that node will end up in a reboot or fencing loop.

I can imagine it being a common situation where the admin wants to run for example Apache and calls both the cluster node and the resource 'apache'.


> Version-Release number of selected component (if applicable):
pacemaker-1.1.12-13.el7.x86_64
corosync-2.3.4-3.el7.x86_64
pcs-0.9.137-3.el7.x86_64

> How reproducible:
Always

> Steps to Reproduce:
1. get a cluster running
2. configure fencing
3. choose random node and create a dummy resource with the same name

> Actual results:
The chosen node ends up in reboot/fencing loop.

> Expected results:
No reboots.

> Additional info:
All necessary info should be contained in the attached cib.xml and te crm report.

Comment 3 Andrew Beekhof 2014-12-17 03:34:15 UTC

Yikes!  Ok, I'll look at this when I get back from the xmas break.

Comment 4 Andrew Beekhof 2015-01-23 00:52:30 UTC

Ran out of time for this, bumping to 7.2 since there is a viable work-around (dont do that! :-)

Comment 6 Andrew Beekhof 2015-03-31 01:27:55 UTC

David, I'd prefer you dug into this since it is tying in with the remote node functionality and I don't want to screw anything up :-)


[12:18 PM] beekhof@fedora ~/Development/sources/pacemaker/devel ☺ # tools/crm_simulate -Sx ~/Downloads/pe-error-1.bz2 -VVV

Current cluster status:
Online: [ virt-041 virt-042 virt-043 virt-044 ]

 Fencing	(stonith:fence_xvm):	Started virt-041 
 virt-044	(ocf::heartbeat:Dummy):	Started [ virt-042 virt-044 ]

  notice: check_rsc_parameters: 	Forcing restart of virt-044 on virt-044, type changed: remote -> Dummy
  notice: check_rsc_parameters: 	Forcing restart of virt-044 on virt-044, provider changed: pacemaker -> heartbeat

[...]

Comment 8 David Vossel 2015-06-05 19:34:15 UTC

patch posted
https://github.com/ClusterLabs/pacemaker/pull/726

This was pretty interesting to track down.

Unit test.

1. create a resource with the exact same name as a pacemaker node.
2. verify no unexpected fencing actions occur after the resource is created.

if the resource successfully starts somewhere without fencing and a bunch of errors/warnings showing up in the logs, then it works.

Comment 12 errata-xmlrpc 2015-11-19 12:12:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2383.html