Bug 1230245 - nova quota calculation breaks Instance HA
Summary: nova quota calculation breaks Instance HA
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: z5
: 7.0 (Kilo)
Assignee: Artom Lifshitz
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks: 1185030 1251948 1261487
TreeView+ depends on / blocked
 
Reported: 2015-06-10 13:33 UTC by Fabio Massimo Di Nitto
Modified: 2019-09-09 14:10 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-06 14:17:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sos-report (9.32 MB, application/x-xz)
2015-06-10 14:04 UTC, Eoghan Glynn
no flags Details
sos-report (8.85 MB, application/x-xz)
2015-06-10 14:14 UTC, Eoghan Glynn
no flags Details
sos-report (9.15 MB, application/x-xz)
2015-06-10 14:17 UTC, Eoghan Glynn
no flags Details

Description Fabio Massimo Di Nitto 2015-06-10 13:33:29 UTC
Description of problem:

Let say that we set a quota of 10 instances (that's the default after installation)

Start 10 instances (all good). One compute node fails and instances are being migrated to another hypervisor. The instances will fail to start because quota are exceeded.

This appears to be an issue related to the fact that even if a compute node is down in nova, the instances on that node are still taken into account for quota, even if dead, leaving no space to rebuild the new ones.

Version-Release number of selected component (if applicable):

openstack-nova-common-2015.1.0-4.el7ost.noarch
openstack-nova-console-2015.1.0-4.el7ost.noarch
openstack-nova-scheduler-2015.1.0-4.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-4.el7ost.noarch
openstack-nova-conductor-2015.1.0-4.el7ost.noarch
openstack-nova-api-2015.1.0-4.el7ost.noarch
python-nova-2015.1.0-4.el7ost.noarch
python-novaclient-2.23.0-1.el7ost.noarch


How reproducible:

always

Steps to Reproduce:
1. set quota to 10 instances
2. start 10 instances
3. kill one compute node hard (crash kernel or whatever)
4. wait for nova to recognize the compute node is down
5. start evacuation

Actual results:

Instances will fail to start.

Expected results:


Instances will start.

Additional info:

I have collected sosreports from many failures here:
http://mrg-01.mpc.lab.eng.bos.redhat.com/sosreports/

I can't pinpoint the time in the logs of when that happened. Also those sosreports will be wiped soon'ish. Please download them if necessary.

Comment 5 Eoghan Glynn 2015-06-10 13:41:06 UTC
Effectively the problem here appears to be that instances under evacuation should not be double-counted against quota while the instance rebuilds are in-flight.

I think this can be reproduced and fixed independently of the instance HA setup.

Comment 6 Eoghan Glynn 2015-06-10 14:04:03 UTC
Created attachment 1037323 [details]
sos-report

Comment 7 Eoghan Glynn 2015-06-10 14:14:06 UTC
Created attachment 1037325 [details]
sos-report

Comment 8 Eoghan Glynn 2015-06-10 14:17:04 UTC
Created attachment 1037326 [details]
sos-report

Comment 9 Fabio Massimo Di Nitto 2015-06-16 08:57:13 UTC
I have tested this same condition with the .10 build and I was not able to reproduce it. It´s probably been fixed in the rebase from .4 to .10, but still worth checking.

Lowering priority/severity

Comment 10 Stephen Gordon 2015-07-16 14:16:50 UTC
(In reply to Fabio Massimo Di Nitto from comment #9)
> I have tested this same condition with the .10 build and I was not able to
> reproduce it. It´s probably been fixed in the rebase from .4 to .10, but
> still worth checking.
> 
> Lowering priority/severity

Any update on this Fabio - have you encountered this again? I am assuming no and moving to 8.0/7.0.z but please keep me posted.

Comment 13 Fabio Massimo Di Nitto 2015-07-16 14:23:59 UTC
(In reply to Stephen Gordon from comment #10)
> (In reply to Fabio Massimo Di Nitto from comment #9)
> > I have tested this same condition with the .10 build and I was not able to
> > reproduce it. It´s probably been fixed in the rebase from .4 to .10, but
> > still worth checking.
> > 
> > Lowering priority/severity
> 
> Any update on this Fabio - have you encountered this again? I am assuming no
> and moving to 8.0/7.0.z but please keep me posted.

I didn´t test this condition anylonger. It´s worth keeping it as TestOnly bug in my opinion since it touches a specific boundary and it´s easy to test.

Comment 18 Artom Lifshitz 2016-06-06 14:17:31 UTC
> I didn´t test this condition anylonger. It´s worth keeping it as TestOnly
> bug in my opinion since it touches a specific boundary and it´s easy to test.

Since this bug hasn't seen any updates for a while, I'm assuming the issue hasn't been encountered again. I'm going to close this bug for now, if you feel it should remain open, don't hesitate to let us know.

Cheers!


Note You need to log in before you can comment on or make changes to this bug.