Bug 1309012

Summary: Once in a while, scheduling will fail to select a node when deploying overcloud
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED CURRENTRELEASE QA Contact: Omri Hochman <ohochman>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: dbecker, dhill, dsneddon, hbrock, jcoufal, jslagle, mburns, morazi, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-14 19:00:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2016-02-16 17:22:06 UTC
Description of problem:
Once in a while, scheduling will fail to select a node when deploying overcloud but retries to schedule the same node and will succeed.

| fault                                | {"message": "No valid host was found. There are not enough hosts available.", "code": 500, "details": "  File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 739, in build_inst
ances |
|                                      |     request_spec, filter_properties)                                                                                                                                                                 
      |
|                                      |   File \"/usr/lib/python2.7/site-packages/nova/scheduler/utils.py\", line 343, in wrapped                                                                                                                  |
|                                      |     return func(*args, **kwargs)                                                                                                                                                                           |
|                                      |   File \"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py\", line 52, in select_destinations                                                                                             |
|                                      |     context, request_spec, filter_properties)                                                                                                                                                              |
|                                      |   File \"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py\", line 37, in __run_method                                                                                                    |
|                                      |     return getattr(self.instance, __name)(*args, **kwargs)                                                                                                                                                 |
|                                      |   File \"/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py\", line 34, in select_destinations                                                                                                |
|                                      |     context, request_spec, filter_properties)                                                                                                                                                              |
|                                      |   File \"/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py\", line 120, in select_destinations                                                                                                     |
|                                      |     request_spec=request_spec, filter_properties=filter_properties)                                                                                                                                        |
|                                      |   File \"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py\", line 158, in call                                                                                                                |
|                                      |     retry=self.retry)                                                                                                                                                                                      |
|                                      |   File \"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py\", line 90, in _send                                                                                                                 |
|                                      |     timeout=timeout, retry=retry)                                                                                                                                                                          |
|                                      |   File \"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 431, in send                                                                                                       |
|                                      |     retry=retry)                                                                                                                                                                                           |
|                                      |   File \"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 422, in _send                                                                                                      |
|                                      |     raise result                                                                                                                                                                                           |
|                                      | ", "created": "2016-02-16T16:14:52Z"}                                                                                                                                                                      |
| flavor                               | ceph-storage (63e4d9db-7e98-4191-830a-f203d0f0685f)                                                          



[stack@undercloud-rhosp8 cloud]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| 5ff3bffc-6e78-4109-a361-f388094dfbef | overcloud-cephstorage-0 | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.20 |
| a920f140-49dd-4818-9f41-813444b73af2 | overcloud-cephstorage-1 | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.18 |
| 5b7246a4-1ffe-4627-acea-f5f93dbcdfaf | overcloud-cephstorage-2 | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.19 |
| 78f1047c-5ed4-46a6-81dc-556df5275a99 | overcloud-cephstorage-2 | ERROR  | -          | NOSTATE     |                     |
| 6087c1dd-b465-4029-898a-64c4ef68280a | overcloud-controller-0  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.23 |
| 3140957b-7cf7-44e3-8c6d-6f739ba56af5 | overcloud-controller-1  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.22 |
| 32441962-9ab4-4fd8-b45f-45b2a053fc92 | overcloud-controller-2  | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.24 |
| 73d2cf8d-2570-4a55-a075-af4cf433819c | overcloud-novacompute-0 | BUILD  | spawning   | NOSTATE     | ctlplane=192.0.2.21 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+


Version-Release number of selected component (if applicable):


How reproducible:
In almost every deployment I had this failure

Steps to Reproduce:
1. Deploy an overcloud
2. Wait
3. If it doesn't happen, go back to step 1.

Actual results:
Node failed at scheduling

Expected results:
Always succeed

Additional info:

Comment 2 Mike Burns 2016-04-07 21:11:06 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 5 David Hill 2016-11-14 15:36:18 UTC
Will reopen if this happens again.  Haven't seen it in a while.