Bug 1165195 - After A/P service was unable to start on one cluster node it was not started by pacemaker on another available node.
Summary: After A/P service was unable to start on one cluster node it was not started ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-18 14:32 UTC by Leonid Natapov
Modified: 2015-01-27 06:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-23 00:36:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
crm report (298.43 KB, application/x-bzip)
2014-11-18 14:32 UTC, Leonid Natapov
no flags Details

Description Leonid Natapov 2014-11-18 14:32:43 UTC
Created attachment 958611 [details]
crm report

Scenario.
I have openstack HA deployed by staypuft.
I stopped neutron-l3-agent on one of the controllers by running systemctl stop neutron-l3-agent.Usually,after a few seconds,pacemaker starts the service again.
I tried to create a scenario where pacemaker fails to start the service and starts it on another available cluster node, so I stopped the service and  renamed the neutron-l3-agent executable file. Pacemaker was unable to start the service but it didn't start it on another node.

Here is systemctl status:
-----------------------------------------------------------------------------
root@mac848f69fbc4c3 bin]# systemctl status neutron-l3-agent
neutron-l3-agent.service - OpenStack Neutron Layer 3 Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-l3-agent.service; disabled)
   Active: failed (Result: exit-code) since Tue 2014-11-18 15:30:11 IST; 11min ago
 Main PID: 60553 (code=exited, status=203/EXEC)

Nov 18 15:30:11 mac848f69fbc4c3.example.com systemd[1]: Started OpenStack Neutron Layer 3 Agent.
Nov 18 15:30:11 mac848f69fbc4c3.example.com systemd[60553]: Failed at step EXEC spawning /usr/bin/neutron-l3-agent: No such file or directory
Nov 18 15:30:11 mac848f69fbc4c3.example.com systemd[1]: neutron-l3-agent.service: main process exited, code=exited, status=203/EXEC
Nov 18 15:30:11 mac848f69fbc4c3.example.com systemd[1]: Unit neutron-l3-agent.service entered failed state.
Nov 18 15:30:13 mac848f69fbc4c3.example.com systemd[1]: Stopped OpenStack Neutron Layer 3 Agent.
-----------------------------------------------------------------------------

You can find pcs status output here - http://pastebin.test.redhat.com/247106
You can find pcs config output here - http://pastebin.test.redhat.com/247107
crm_report attached

pacemaker-cluster-libs-1.1.10-32.el7_0.1.x86_64
pacemaker-cli-1.1.10-32.el7_0.1.x86_64
pacemaker-libs-1.1.10-32.el7_0.1.x86_64
pacemaker-1.1.10-32.el7_0.1.x86_64

Comment 2 Andrew Beekhof 2014-11-26 09:35:58 UTC
Has this report been sanitized?
There are no logs or pengine files. I'm basically flying blind so far

Comment 3 Leonid Natapov 2014-11-26 18:52:24 UTC
Unfortunately I don't have the original setup where the problem occurred. I tried to reproduce the problem on clean HA+Neutron deployment and was unable to do so. The problem didn't reproduce.I will continue to investigate.

Comment 4 Andrew Beekhof 2015-01-23 00:36:48 UTC
Please re-open if you are able to reproduce this

Comment 5 Leonid Natapov 2015-01-27 06:59:25 UTC
Problem didn't reproduce.


Note You need to log in before you can comment on or make changes to this bug.