Bug 1235696

Summary:	nova fails to schedule Instances when compute node is dead but not "down"
Product:	Red Hat OpenStack	Reporter:	Fabio Massimo Di Nitto <fdinitto>
Component:	openstack-nova	Assignee:	Eoghan Glynn <eglynn>
Status:	CLOSED WONTFIX	QA Contact:	nlevinki <nlevinki>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	abeekhof, berrange, cluster-maint, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone:	---	Keywords:	ZStream
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-06-05 17:05:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1185030, 1251948, 1261487

Description Fabio Massimo Di Nitto 2015-06-25 14:06:21 UTC

Once again, testing Instance HA but this issue can be reproduced without pacemaker or HA. It is time sensitive, so be careful at the sequence of events:

1) compute node1 with X instances dies a horrible death
2) compute node1 is marked "down" in nova
3) compute node2 explodes (but still marked "up" in nova)
4) request evacuation of instances from node1 immediately after #3.
5) nova can potentially schedule instances on node2 (that is dead already but nova doesn't know it yet) and node3 (perfectly functional)
6) instances will go in REBUILD state (expected)
7) instances scheduled on node2 (dead) will ERROR after some time.
8) instances scheduled on node3 will boot fine

The problem here is that nova attempts to schedule to node2 (that's fine, it is still marked alive) but doesn't even check for a reply from node2 that the command has been successful (looks like a one way communication) and once the node2 is marked down, nova takes no action to reschedule instances on node3 (as expected).

Instances scheduled on node2 will enter ERROR state once node2 rejoins the computing cluster.

Comment 5 Eoghan Glynn 2015-06-25 14:18:04 UTC

Clarifying comment #0:

  node exploding == dying a horrible death == $ echo c > /proc/sysrq_trigger

Comment 6 Dan Smith 2015-06-26 14:59:30 UTC

This is a well-understood gap at the moment. There are several efforts underway to close this, all of which will be things we land in Liberty (hopefully). None of them will be candidates for backporting to Kilo, IMHO.

First, service groups based on tooz will massively shorten the delay between a node going down and nova noticing.

Second, the mark-host-down functionality will allow things like pacemaker to be in the *middle* of decisions about host failure, evacuation decisions, etc.

Third, the additional information about in-process evacuations provided by the robustify-evacuation work in the form of recording evacuations as migrations, as well as new notifications about progress, will allow things like pacemaker to make much more informed decisions about restarting this process if it does race with a secondary node failure.

Comment 7 Fabio Massimo Di Nitto 2015-06-26 17:00:58 UTC

Dan I understand this is a work in progress and probably fixed in Liberty. The issue here is that it defeats the work of Instance HA in case of some failure at given times (basically it's not real HA).

The problem here is not a workaround of when nova notices the host is down (or how). The problem is that nova tries to schedule something, it doesn't even check if the other hand has received the request and let things die there.

This could happen even when booting a new instance I guess. Some level of: "hey nodeX can you boot an instance?" and receive no reply from node X, nova should take action to ask node Y. It's not even a matter if nova knows that X is down or not.

Anyway we will need some level of fixing in Kilo here since Instance HA is going to be flagship feature for OSP7.

Comment 10 Stephen Gordon 2015-07-16 14:18:32 UTC

Realistically there is nothing that can be done about this for 7.0, so I'm marking for 8.0/7.0.z.

Comment 13 Stephen Gordon 2015-09-10 19:11:22 UTC

I believe we need to enhance the fence agent for Nova to use the new mark host down API, where it is available, to handle this.

Comment 14 Fabio Massimo Di Nitto 2015-09-10 19:38:39 UTC

(In reply to Stephen Gordon from comment #13)
> I believe we need to enhance the fence agent for Nova to use the new mark
> host down API, where it is available, to handle this.

Can you please don´t bounce bugs around if you are not sure what they are for?
If you are in doubt please ask before proceeding.

Comment 17 Fabio Massimo Di Nitto 2015-09-10 19:41:30 UTC

With or without using the new nova api mark down, there is still a race condition within nova. This bug about this specific race condition that has to be fixed as pointed out in the comments above.

Comment 19 Dan Smith 2017-06-06 14:18:07 UTC

The mark-host-down API was merged in Liberty (OSP8). It's an API and internals change, so it won't be backported to anything before that. But, it sounds like that API doesn't really even help your use case.

Nova's boot operation is fundamentally a cast and that's not really going to ever change, AFAIK. Specific work items that make it easier to externally detect that an instance boot has been dropped on the floor are certainly up for discussion.

I don't think this generalized bug has any specific work to do, so +1 for closing.