977082 – nova [Negative]: instances are stuck on task 'scheduling' when running multiple instances and compute service is down on one of the hosts

Bug 977082 - nova [Negative]: instances are stuck on task 'scheduling' when running multiple instances and compute service is down on one of the hosts

Summary: nova [Negative]: instances are stuck on task 'scheduling' when running multip...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.0
Assignee:	Nikola Dipanov
QA Contact:	Ami Jeain
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-23 11:53 UTC by Dafna Ron
Modified:	2019-09-09 17:08 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-01-02 16:20:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (1.78 MB, application/x-gzip) 2013-06-23 11:53 UTC, Dafna Ron	no flags	Details
View All

Description Dafna Ron 2013-06-23 11:53:27 UTC

Created attachment 764292 [details]
logs

Description of problem:

I installed an AIO + one more nova compute host. 

in the host that has only nova-compute I stopped the openstack-nova-compute service and ran 10 instances. 

5 out of the 10 instances got stuck in state BUILD with task 'scheduling' 
even after i started the service the instances are not starting. 

Version-Release number of selected component (if applicable):

openstack-nova-api-2013.1.2-2.el6ost.noarch
openstack-nova-scheduler-2013.1.2-2.el6ost.noarch
openstack-nova-compute-2013.1.2-2.el6ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. create an AIO + one more host with nova-compute on it
2. stop the nova-compute on the host that has nova-compute only
3. start multiple instances 

Actual results:

some of the instances are stuck on status BUILD with task scheduling 
even after starting the service the instances are not finishing the build. 

Expected results:

1. if we cannot run the instances we should not start them at all (i.e we should detect that the service is down and we cannot run the instances on that host) 
2. if for any reason we do start them we should move them to Error once we find that we cannot sustain them 
3. if an instance is stuck in scheduling task we should be able to start it once we have an additional resource. 

Additional info: logs

[root@opens-vdsb tmp(keystone_admin)]# nova list
+--------------------------------------+--------------------------------------------+--------+--------------------------+
| ID                                   | Name                                       | Status | Networks                 |
+--------------------------------------+--------------------------------------------+--------+--------------------------+
| 009efbf6-de2b-451b-870f-fdf1c19414e8 | dafna-009efbf6-de2b-451b-870f-fdf1c19414e8 | BUILD  |                          |
| 116b0b55-a9bc-4dcd-b7d8-abe141510e38 | dafna-116b0b55-a9bc-4dcd-b7d8-abe141510e38 | BUILD  |                          |
| 896e10ef-0906-46ad-8001-99a35062a381 | dafna-896e10ef-0906-46ad-8001-99a35062a381 | BUILD  |                          |
| 9b2d2161-2ee0-4c66-99bb-73ce26759cc3 | dafna-9b2d2161-2ee0-4c66-99bb-73ce26759cc3 | ACTIVE | novanetwork=192.168.32.2 |
| a316b3a5-5b46-4cb4-aa24-9c8d328a0d67 | dafna-a316b3a5-5b46-4cb4-aa24-9c8d328a0d67 | ACTIVE | novanetwork=192.168.32.6 |
| ae19066f-70d3-4f40-a402-0dad2d2cabb4 | dafna-ae19066f-70d3-4f40-a402-0dad2d2cabb4 | ACTIVE | novanetwork=192.168.32.4 |
| bff72b2f-f7f1-48e7-9b29-2dc34499d318 | dafna-bff72b2f-f7f1-48e7-9b29-2dc34499d318 | BUILD  |                          |
| c09f7a34-5d79-4d7e-96f4-ae2ac29d270e | dafna-c09f7a34-5d79-4d7e-96f4-ae2ac29d270e | ACTIVE | novanetwork=192.168.32.3 |
| dc6c71b1-a630-46f8-b5c3-51cd215112f9 | dafna-dc6c71b1-a630-46f8-b5c3-51cd215112f9 | ACTIVE | novanetwork=192.168.32.5 |
| f8eed9d0-6c2e-4129-a073-dedfb8c5e0a6 | dafna-f8eed9d0-6c2e-4129-a073-dedfb8c5e0a6 | BUILD  |                          |
+--------------------------------------+--------------------------------------------+--------+--------------------------+


[root@opens-vdsb tmp(keystone_admin)]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 3     instance-00000016              running
 4     instance-0000001a              running
 5     instance-00000014              running
 6     instance-00000012              running
 7     instance-00000018              running


[root@nott-vdsa ~(keystone_admin)]# virsh -r list
 Id    Name                           State
----------------------------------------------------

Comment 1 Nikola Dipanov 2013-11-01 13:06:31 UTC

This bug seems to have been caught with RHOS 3.0. Now that we have 4.0 builds - it would be good to confirm weather this is still an issue.

Comment 2 Dafna Ron 2013-11-01 14:15:37 UTC

it seems that instances move to ERROR state if they cannot run in Havana

Comment 3 Nikola Dipanov 2013-11-13 16:32:14 UTC

After a brief chat with Dafna - she seems to think the issue is fixed, so the bug has likely been fixed in the Havana release. Due to the nature of the bug - she was keen to do a few more tests so I am leaving a needinfo on, so that we can confirm that it is indeed fixed.

Comment 5 Dave Allan 2014-01-02 16:20:04 UTC

Closing as we are unable to reproduce it with 4.0; please reopen if it reappears.

Note You need to log in before you can comment on or make changes to this bug.