1257208 – [Nova] Boot stuck in BUILD scheduling

Bug 1257208 - [Nova] Boot stuck in BUILD scheduling

Summary: [Nova] Boot stuck in BUILD scheduling

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	7.0 (Kilo)
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	8.0 (Liberty)
Assignee:	Eoghan Glynn
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-26 13:50 UTC by Joe Talerico
Modified:	2019-09-09 17:17 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-01 21:04:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Joe Talerico 2015-08-26 13:50:17 UTC

Description of problem:
Running : 
stack@gprfc001 ~]$ for i in {1..50} ; do nova boot --flavor m1.small --image cirros rook-for-$i > /dev/null 2>&1 & done

Results in some instances "stuck" here:
| 5abb0ca3-faff-4ad9-be46-ac7c79b2e6bc | rook-for-48 | BUILD  | scheduling | NOSTATE     |          |

Looking at the instance :
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000134e                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| config_drive                         |                                               |
| created                              | 2015-08-25T21:16:26Z                          |
| flavor                               | m1.small (2)                                  |
| hostId                               |                                               |
| id                                   | 5abb0ca3-faff-4ad9-be46-ac7c79b2e6bc          |
| image                                | cirros (c9c67c96-5b69-43fc-9d15-825d59b7c9d2) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | rook-for-48                                   |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| status                               | BUILD                                         |
| tenant_id                            | 69f1e9392ffd4160bb0aaa0562dd31a8              |
| updated                              | 2015-08-25T21:17:39Z                          |
| user_id                              | 05e4ebc87c174835a1f1ff7ad4973c9e              |
+--------------------------------------+-----------------------------------------------+

I let the instance sit like this overnight > 8 hours. It never goes into ERROR State.

Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Deploy HA from OSPd
2. Import a cirros/centos image
3. for i in {1..50} ; do nova boot --flavor m1.small --image cirros rook-for-$i > /dev/null 2>&1 & done

Expected results:
Either be scheduled to launch or go int ERROR state.

Comment 4 Dan Smith 2015-08-28 14:41:12 UTC

Joe,

Can you reproduce this and grab logs for us? Specifically I want to see if the instance uuid ever shows up in anything other than api or conductor logs. Without digging deeply, the only thing I can really think of, barring any tracebacks or errors related to this instance, is that the boot request cast message got dropped on the floor (which could technically happen).

Comment 5 Joe Talerico 2015-08-28 20:42:24 UTC

Hey Dan - I can grab the logs, but I can reassure you that the UUID only shows up in the nova-api logs. Never never shows up on the compute nodes (I searched the req-id and for the instance name the instance-#.

If you think having the logs will help trace this, I will re-run and attach them.

Joe

Comment 6 Stephen Gordon 2016-02-01 21:04:11 UTC

This seems to be pretty stale at this point, if you are still encountering this situation please re-open and attach additional logs as requested.

Note You need to log in before you can comment on or make changes to this bug.