Hide Forgot
Description of problem: Standby / Unstandby of a node & standby of 2 node causes random node to be fenced . Version-Release number of selected component (if applicable): pacemaker 1.1.12-22.el7_1.1 - Red Hat x86_64 pacemaker-cli 1.1.12-22.el7_1.1 - Red Hat x86_64 pacemaker-cluster-libs 1.1.12-22.el7_1.1 - Red Hat x86_64 pacemaker-libs 1.1.12-22.el7_1.1 - Red Hat x86_64 resource-agents 3.9.5-40.el7_1.3 - Red Hat x86_64 How reproducible: Always at customer end Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Immediate guess... one of the services is refusing to stop when required. Will check crm report to confirm
I suspect this is a duplicate of BZ#1257414. Some of the constraints added a result of that BZ are present in this cluster, but not all of them. For example, I don't see: pcs constraint order start haproxy-clone then openstack-keystone-clone pcs constraint order promote redis-master then start openstack-ceilometer-central-clone require-all=false I'm reassigning this bz to Andrew Beekhof so he can review the deployment, as he is more familiar with this type of setup than I am. If the configuration review doesn't solve all the issues, I do think an upgrade of the RHEL and OSP packages would be beneficial.
Where are the attachments? yank can't find any [abeekhof@collab-shell ~]$ yank 01588802 * searching for attachments for ticket 01588802. * the ticket 01588802 doesn't appear to have any attachments * [searching] dropbox for case related attachments * [renaming] filenames and checking for duplicates * [erasing] empty directories
Is there somewhere persistent we can put these? I'm back from summit now and they've been wiped from collab :-(
Basically neutron is not stopping in time: Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: unpack_rsc_op_failure: Processing failed op stop for neutron-server:0 on pcmk-ncerdlabdell400: OCF_TIMEOUT (198) Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: unpack_rsc_op_failure: Processing failed op stop for neutron-server:0 on pcmk-ncerdlabdell400: OCF_TIMEOUT (198) This is leading to: Feb 25 19:07:06 ncerdlabdell400 pengine[3834]: warning: pe_fence_node: Node pcmk-ncerdlabdell400 will be fenced because of resource failure(s) Basically this is a dup of Bug 1295835. You can see https://bugzilla.redhat.com/show_bug.cgi?id=1290599#c20 for the work-around. *** This bug has been marked as a duplicate of bug 1295835 ***