Bug 1230249 - Instance evacuation not initated
Summary: Instance evacuation not initated
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 7.0 (Kilo)
Assignee: Eoghan Glynn
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks: 1185030 1251948 1261487
TreeView+ depends on / blocked
 
Reported: 2015-06-10 13:43 UTC by Fabio Massimo Di Nitto
Modified: 2019-09-09 16:35 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-16 08:58:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Fabio Massimo Di Nitto 2015-06-10 13:43:50 UTC
Description of problem:

while testing instance HA, we failed a compute node hard. There were 3 or 4 instances running on given compute node.

All but one instance have been rebuilt on another compute node.

One specific instance was still "running" on the fail compute node, based on nova list, nova show <instance>.

Rebuilt was not started/completed till the compute node started again 10/15 minutes later.

Version-Release number of selected component (if applicable):

openstack-nova-common-2015.1.0-4.el7ost.noarch
openstack-nova-console-2015.1.0-4.el7ost.noarch
openstack-nova-scheduler-2015.1.0-4.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-4.el7ost.noarch
openstack-nova-conductor-2015.1.0-4.el7ost.noarch
openstack-nova-api-2015.1.0-4.el7ost.noarch
python-nova-2015.1.0-4.el7ost.noarch
python-novaclient-2.23.0-1.el7ost.noarch


How reproducible:

randomly


Steps to Reproduce:
1. deploy N instance
2. fail a compute node
3. request evacuation using fence_compute code/logic as written by Sylvain
4. notice instances not being rebuilt.

Actual results:

Instance was in SHUTDOWN after the compute node was able to start again. The instance had to be recovered manually

Expected results:

Instance starts happily by itself.

Additional info:

http://mrg-01.mpc.lab.eng.bos.redhat.com/sosreports/

This event happened within the last 14 hours in those sosreports. Please make sure to download them fast because they will vanish at somepoint (Eoghan is aware of it).

Comment 3 Fabio Massimo Di Nitto 2015-06-16 08:58:02 UTC
After more in depth investigation, the problem has been isolated to pacemaker. Not a nova bug (we already have it under the radar in pacemaker).


Note You need to log in before you can comment on or make changes to this bug.