1235696 – nova fails to schedule Instances when compute node is dead but not "down"

Bug 1235696 - nova fails to schedule Instances when compute node is dead but not "down"

Summary: nova fails to schedule Instances when compute node is dead but not "down"

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	8.0 (Liberty)
Assignee:	Eoghan Glynn
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1185030 1251948 1261487
TreeView+	depends on / blocked

Reported:	2015-06-25 14:06 UTC by Fabio Massimo Di Nitto
Modified:	2019-09-09 13:23 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-05 17:05:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Fabio Massimo Di Nitto 2015-06-25 14:06:21 UTC

Once again, testing Instance HA but this issue can be reproduced without pacemaker or HA. It is time sensitive, so be careful at the sequence of events:

1) compute node1 with X instances dies a horrible death
2) compute node1 is marked "down" in nova
3) compute node2 explodes (but still marked "up" in nova)
4) request evacuation of instances from node1 immediately after #3.
5) nova can potentially schedule instances on node2 (that is dead already but nova doesn't know it yet) and node3 (perfectly functional)
6) instances will go in REBUILD state (expected)
7) instances scheduled on node2 (dead) will ERROR after some time.
8) instances scheduled on node3 will boot fine

The problem here is that nova attempts to schedule to node2 (that's fine, it is still marked alive) but doesn't even check for a reply from node2 that the command has been successful (looks like a one way communication) and once the node2 is marked down, nova takes no action to reschedule instances on node3 (as expected).

Instances scheduled on node2 will enter ERROR state once node2 rejoins the computing cluster.

Comment 5 Eoghan Glynn 2015-06-25 14:18:04 UTC

Clarifying comment #0:

  node exploding == dying a horrible death == $ echo c > /proc/sysrq_trigger

Comment 6 Dan Smith 2015-06-26 14:59:30 UTC

This is a well-understood gap at the moment. There are several efforts underway to close this, all of which will be things we land in Liberty (hopefully). None of them will be candidates for backporting to Kilo, IMHO.

First, service groups based on tooz will massively shorten the delay between a node going down and nova noticing.

Second, the mark-host-down functionality will allow things like pacemaker to be in the *middle* of decisions about host failure, evacuation decisions, etc.

Third, the additional information about in-process evacuations provided by the robustify-evacuation work in the form of recording evacuations as migrations, as well as new notifications about progress, will allow things like pacemaker to make much more informed decisions about restarting this process if it does race with a secondary node failure.

Comment 7 Fabio Massimo Di Nitto 2015-06-26 17:00:58 UTC

Dan I understand this is a work in progress and probably fixed in Liberty. The issue here is that it defeats the work of Instance HA in case of some failure at given times (basically it's not real HA).

The problem here is not a workaround of when nova notices the host is down (or how). The problem is that nova tries to schedule something, it doesn't even check if the other hand has received the request and let things die there.

This could happen even when booting a new instance I guess. Some level of: "hey nodeX can you boot an instance?" and receive no reply from node X, nova should take action to ask node Y. It's not even a matter if nova knows that X is down or not.

Anyway we will need some level of fixing in Kilo here since Instance HA is going to be flagship feature for OSP7.

Comment 10 Stephen Gordon 2015-07-16 14:18:32 UTC

Realistically there is nothing that can be done about this for 7.0, so I'm marking for 8.0/7.0.z.

Comment 13 Stephen Gordon 2015-09-10 19:11:22 UTC

I believe we need to enhance the fence agent for Nova to use the new mark host down API, where it is available, to handle this.

Comment 14 Fabio Massimo Di Nitto 2015-09-10 19:38:39 UTC

(In reply to Stephen Gordon from comment #13)
> I believe we need to enhance the fence agent for Nova to use the new mark
> host down API, where it is available, to handle this.

Can you please don´t bounce bugs around if you are not sure what they are for?
If you are in doubt please ask before proceeding.

Comment 17 Fabio Massimo Di Nitto 2015-09-10 19:41:30 UTC

With or without using the new nova api mark down, there is still a race condition within nova. This bug about this specific race condition that has to be fixed as pointed out in the comments above.

Comment 19 Dan Smith 2017-06-06 14:18:07 UTC

The mark-host-down API was merged in Liberty (OSP8). It's an API and internals change, so it won't be backported to anything before that. But, it sounds like that API doesn't really even help your use case.

Nova's boot operation is fundamentally a cast and that's not really going to ever change, AFAIK. Specific work items that make it easier to externally detect that an instance boot has been dropped on the floor are certainly up for discussion.

I don't think this generalized bug has any specific work to do, so +1 for closing.

Note You need to log in before you can comment on or make changes to this bug.