Bug 1371246

Summary: OpenStack fails to deploy with "No valid host was found. There are not enough hosts available." - RamFilter eliminates all
Product: Red Hat OpenStack Reporter: Ed Balduf <balduf>
Component: openstack-ironicAssignee: Lucas Alvares Gomes <lmartins>
Status: CLOSED DUPLICATE QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.0 (Mitaka)CC: akarlsso, dtantsur, mburns, rhel-osp-director-maint, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-06 14:47:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1193930    
Attachments:
Description Flags
Text output of my debugging steps, including relevant logs etc...
none
Update host state from compute node none

Description Ed Balduf 2016-08-29 17:14:30 UTC
Created attachment 1195445 [details]
Text output of my debugging steps, including relevant logs etc...

Description of problem: OpenStack Directory install always fails with "No valid host was found. There are not enough hosts available." when traced through the logs the reason is the RamFilter eliminates all hosts, even though they have enough RAM.  See the logs provided


Version-Release number of selected component (if applicable): RHEL 7.2, OSP 9


How reproducible: 100%


Steps to Reproduce:
1. Install undercloud
2. Configure undercloud
3. Deploy Overcloud

Actual results: Stack overcloud CREATE_FAILED


Expected results: Working OpenStack


Additional info: See attached debugging output.  Notice the ironic node-list Instance's do not match the nova list instances for the error instances.  Then look through the nova-scheduler log for that instance, all hosts removed by RamFilter (they all had plenty when starting). Then through the nova-scheduler.log for everything about this request.  This pattern happens for all of the error instances.

Next look at all of the error instances and what happened with the Nova scheduler for each of them  notice that the first 3 instances fail to schedule for RamFilter, but they are all assigned an ironic node.  The next time the scheduler attempts to scheduler the retry it cannot because all of the hosts are taken.  

I have a couple days which I can leave this environment as it is documented here, let me know what else you need?

Comment 2 Ed Balduf 2016-08-31 16:52:04 UTC
I'm suspecting this has something to do with this https://bugs.launchpad.net/nova/+bug/1503453 

I pulled all of the "Update host state from compute node" lines for the time of this deploy from the nova-scheduler.log and massaged them down to a report of the memory utilization of each line (and Hypervisor_name) and it shows that available memory goes negative (which it shouldn't).  I am attaching this report as file scheduler-udpates.formatted

Comment 3 Ed Balduf 2016-08-31 16:53:50 UTC
Created attachment 1196461 [details]
Update host state from compute node

grep "Update host state from compute node" nova-scheduler.log | through lots of formatting.

Comment 4 Dmitry Tantsur 2016-09-06 14:47:45 UTC

*** This bug has been marked as a duplicate of bug 1370651 ***