Bug 1322387 - Standby / Unstandby of a node & standby of 2 node causes random node to be fenced .
Summary: Standby / Unstandby of a node & standby of 2 node causes random node to be fe...
Keywords:
Status: CLOSED DUPLICATE of bug 1295835
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.1
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1296673
TreeView+ depends on / blocked
 
Reported: 2016-03-30 12:14 UTC by Jaison Raju
Modified: 2019-10-10 11:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-09 04:33:32 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jaison Raju 2016-03-30 12:14:10 UTC
Description of problem:
Standby / Unstandby of a node & standby of 2 node causes random node to be fenced .

Version-Release number of selected component (if applicable):
pacemaker 1.1.12-22.el7_1.1 - Red Hat x86_64
pacemaker-cli 1.1.12-22.el7_1.1 - Red Hat x86_64
pacemaker-cluster-libs 1.1.12-22.el7_1.1 - Red Hat x86_64
pacemaker-libs 1.1.12-22.el7_1.1 - Red Hat x86_64
resource-agents 3.9.5-40.el7_1.3 - Red Hat x86_64

How reproducible:
Always at customer end

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Andrew Beekhof 2016-04-05 04:19:47 UTC
Immediate guess... one of the services is refusing to stop when required.
Will check crm report to confirm

Comment 9 Ken Gaillot 2016-04-11 17:49:46 UTC
I suspect this is a duplicate of BZ#1257414. Some of the constraints added a result of that BZ are present in this cluster, but not all of them. For example, I don't see:

pcs constraint order start haproxy-clone then openstack-keystone-clone
pcs constraint order promote redis-master then start openstack-ceilometer-central-clone require-all=false

I'm reassigning this bz to Andrew Beekhof so he can review the deployment, as he is more familiar with this type of setup than I am.

If the configuration review doesn't solve all the issues, I do think an upgrade of the RHEL and OSP packages would be beneficial.

Comment 10 Andrew Beekhof 2016-04-20 01:10:39 UTC
Where are the attachments?  yank can't find any

[abeekhof@collab-shell ~]$ yank 01588802
* searching for attachments for ticket 01588802.
* the ticket 01588802 doesn't appear to have any attachments
* [searching] dropbox for case related attachments
* [renaming] filenames and checking for duplicates
* [erasing] empty directories

Comment 12 Andrew Beekhof 2016-05-09 02:21:20 UTC
Is there somewhere persistent we can put these?
I'm back from summit now and they've been wiped from collab :-(

Comment 14 Andrew Beekhof 2016-05-09 04:33:32 UTC
Basically neutron is not stopping in time:

Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: unpack_rsc_op_failure: Processing failed op stop for neutron-server:0 on pcmk-ncerdlabdell400: OCF_TIMEOUT (198)
Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: unpack_rsc_op_failure: Processing failed op stop for neutron-server:0 on pcmk-ncerdlabdell400: OCF_TIMEOUT (198)

This is leading to:

Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: pe_fence_node: Node pcmk-ncerdlabdell400 will be fenced because of resource failure(s)


Basically this is a dup of Bug 1295835.
You can see https://bugzilla.redhat.com/show_bug.cgi?id=1290599#c20 for the work-around.

*** This bug has been marked as a duplicate of bug 1295835 ***


Note You need to log in before you can comment on or make changes to this bug.