Bug 1371246 - OpenStack fails to deploy with "No valid host was found. There are not enough hosts available." - RamFilter eliminates all
Summary: OpenStack fails to deploy with "No valid host was found. There are not enough...
Keywords:
Status: CLOSED DUPLICATE of bug 1370651
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 9.0 (Mitaka)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Lucas Alvares Gomes
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks: 1193930
TreeView+ depends on / blocked
 
Reported: 2016-08-29 17:14 UTC by Ed Balduf
Modified: 2016-09-13 12:20 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-06 14:47:45 UTC
Target Upstream Version:


Attachments (Terms of Use)
Text output of my debugging steps, including relevant logs etc... (71.46 KB, text/plain)
2016-08-29 17:14 UTC, Ed Balduf
no flags Details
Update host state from compute node (16.78 KB, text/plain)
2016-08-31 16:53 UTC, Ed Balduf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1572555 0 None None None 2016-08-29 17:14:30 UTC

Description Ed Balduf 2016-08-29 17:14:30 UTC
Created attachment 1195445 [details]
Text output of my debugging steps, including relevant logs etc...

Description of problem: OpenStack Directory install always fails with "No valid host was found. There are not enough hosts available." when traced through the logs the reason is the RamFilter eliminates all hosts, even though they have enough RAM.  See the logs provided


Version-Release number of selected component (if applicable): RHEL 7.2, OSP 9


How reproducible: 100%


Steps to Reproduce:
1. Install undercloud
2. Configure undercloud
3. Deploy Overcloud

Actual results: Stack overcloud CREATE_FAILED


Expected results: Working OpenStack


Additional info: See attached debugging output.  Notice the ironic node-list Instance's do not match the nova list instances for the error instances.  Then look through the nova-scheduler log for that instance, all hosts removed by RamFilter (they all had plenty when starting). Then through the nova-scheduler.log for everything about this request.  This pattern happens for all of the error instances.

Next look at all of the error instances and what happened with the Nova scheduler for each of them  notice that the first 3 instances fail to schedule for RamFilter, but they are all assigned an ironic node.  The next time the scheduler attempts to scheduler the retry it cannot because all of the hosts are taken.  

I have a couple days which I can leave this environment as it is documented here, let me know what else you need?

Comment 2 Ed Balduf 2016-08-31 16:52:04 UTC
I'm suspecting this has something to do with this https://bugs.launchpad.net/nova/+bug/1503453 

I pulled all of the "Update host state from compute node" lines for the time of this deploy from the nova-scheduler.log and massaged them down to a report of the memory utilization of each line (and Hypervisor_name) and it shows that available memory goes negative (which it shouldn't).  I am attaching this report as file scheduler-udpates.formatted

Comment 3 Ed Balduf 2016-08-31 16:53:50 UTC
Created attachment 1196461 [details]
Update host state from compute node

grep "Update host state from compute node" nova-scheduler.log | through lots of formatting.

Comment 4 Dmitry Tantsur 2016-09-06 14:47:45 UTC

*** This bug has been marked as a duplicate of bug 1370651 ***


Note You need to log in before you can comment on or make changes to this bug.