Bug 1257208 - [Nova] Boot stuck in BUILD scheduling
[Nova] Boot stuck in BUILD scheduling
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
7.0 (Kilo)
All Linux
medium Severity medium
: ---
: 8.0 (Liberty)
Assigned To: Eoghan Glynn
nlevinki
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-26 09:50 EDT by Joe Talerico
Modified: 2016-02-01 16:04 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-01 16:04:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Talerico 2015-08-26 09:50:17 EDT
Description of problem:
Running : 
stack@gprfc001 ~]$ for i in {1..50} ; do nova boot --flavor m1.small --image cirros rook-for-$i > /dev/null 2>&1 & done

Results in some instances "stuck" here:
| 5abb0ca3-faff-4ad9-be46-ac7c79b2e6bc | rook-for-48 | BUILD  | scheduling | NOSTATE     |          |

Looking at the instance :
+--------------------------------------+-----------------------------------------------+
| Property                             | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | -                                             |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000134e                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | -                                             |
| OS-SRV-USG:terminated_at             | -                                             |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| config_drive                         |                                               |
| created                              | 2015-08-25T21:16:26Z                          |
| flavor                               | m1.small (2)                                  |
| hostId                               |                                               |
| id                                   | 5abb0ca3-faff-4ad9-be46-ac7c79b2e6bc          |
| image                                | cirros (c9c67c96-5b69-43fc-9d15-825d59b7c9d2) |
| key_name                             | -                                             |
| metadata                             | {}                                            |
| name                                 | rook-for-48                                   |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| status                               | BUILD                                         |
| tenant_id                            | 69f1e9392ffd4160bb0aaa0562dd31a8              |
| updated                              | 2015-08-25T21:17:39Z                          |
| user_id                              | 05e4ebc87c174835a1f1ff7ad4973c9e              |
+--------------------------------------+-----------------------------------------------+

I let the instance sit like this overnight > 8 hours. It never goes into ERROR State.

Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Deploy HA from OSPd
2. Import a cirros/centos image
3. for i in {1..50} ; do nova boot --flavor m1.small --image cirros rook-for-$i > /dev/null 2>&1 & done

Expected results:
Either be scheduled to launch or go int ERROR state.
Comment 4 Dan Smith 2015-08-28 10:41:12 EDT
Joe,

Can you reproduce this and grab logs for us? Specifically I want to see if the instance uuid ever shows up in anything other than api or conductor logs. Without digging deeply, the only thing I can really think of, barring any tracebacks or errors related to this instance, is that the boot request cast message got dropped on the floor (which could technically happen).
Comment 5 Joe Talerico 2015-08-28 16:42:24 EDT
Hey Dan - I can grab the logs, but I can reassure you that the UUID only shows up in the nova-api logs. Never never shows up on the compute nodes (I searched the req-id and for the instance name the instance-#.

If you think having the logs will help trace this, I will re-run and attach them.

Joe
Comment 6 Stephen Gordon 2016-02-01 16:04:11 EST
This seems to be pretty stale at this point, if you are still encountering this situation please re-open and attach additional logs as requested.

Note You need to log in before you can comment on or make changes to this bug.