Bug 1370651

Summary: Overcloud deploy fails with No valid Host Found
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: python-tripleoclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: akarlsso, balduf, dtantsur, hbrock, jslagle, juhu, lmartins, mburns, rcernin, rhel-osp-director-maint, smalleni, srevivo
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-20 11:48:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1193930    
Attachments:
Description Flags
nova-scheduler log on undercloud
none
nova scheduler log
none
info
none
latest nova-scheduler.log none

Description Sai Sindhur Malleni 2016-08-26 21:38:52 UTC
Created attachment 1194501 [details]
nova-scheduler log on undercloud

Description of problem: Although the nodes introspect successfully and their capabilities such as Memory, disk etc can be seen using ironic node-show, they always fail the RAMFilter and the nova-scheduler logs show that the nodes have RAM of 0MB when that is not the case.

2016-08-26 18:18:09.212 23413 DEBUG nova.scheduler.filters.ram_filter [req-c419caa4-8061-4100-9f8d-3d4470e56feb f2535dce633947f5a9a8388497596146 cdfa3012d0834451a6b742fd3bf3c4e9 - - -] (manager.example.com, e0b6de31-6610-4c9d-8cbe-dc842cd9877f) ram: 0MB disk: 0MB io_ops: 0 instances: 0 does not have 4096 MB usable ram before overcommit, it only has 0 MB. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:45
2016-08-26 18:18:09.212 23413 INFO nova.filters [req-c419caa4-8061-4100-9f8d-3d4470e56feb f2535dce633947f5a9a8388497596146 cdfa3012d0834451a6b742fd3bf3c4e9 - - -] Filter RamFilter returned 0 hosts



Version-Release number of selected component (if applicable):
RHOP 10 Puddle- 2016-08-24.1

How reproducible:
100% on my environment


Steps to Reproduce:
1. Install undercloud
2. Deploy overcloud
3.

Actual results:
Overcloud create fails in under 3 min reporting that No valid hosts could be found

Expected results:
Overcloud should be created

Additional info:
Attached is Nova scheduler log on undercloud

Comment 2 Lucas Alvares Gomes 2016-08-29 11:14:21 UTC
"No valid host found" can be raised by many different problems, please take a look at [0] and see if that solves it for you.

Also, if it doesn't, can you please paste the output of the ironic nodes (ironic node-show <uui>) please ?

[0] http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-overcloud.html#no-valid-host-found-error

Comment 3 Sai Sindhur Malleni 2016-08-29 13:40:24 UTC
Lucas, 
Thanks for your reply. https://gist.github.com/smalleni/d61060209418bb82e5d7e523843aaf73 caputres mos tof the information from ironic node-list and show. If you look at the deploy command there, I was using profile matching using --control-flavor and had the appropriate profile set in the capabilities. However for computes I wasn't passing --compute-flavor in the deploy command and also didn't have that profile set in capabilities.

So my questions now are:
1. Can I not only pass in --control-flavor and skip --compute-flavor in the deploy command
2. Even if the nova-scheduler couldn't find the capabilities it was looking for why does it complain about RamFilter failing?

Comment 4 Jun Hu 2016-09-06 01:49:21 UTC
I met same issue, and I feel strange for the number within logs.

Filter DiskFilter returned 0 hosts

2016-09-06 08:49:00.653 10289 DEBUG nova.filters [req-1fe146c1-fff5-417c-9254-eddb9cfe0b64 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter ComputeCapabilitiesFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-09-06 08:49:00.653 10289 DEBUG nova.filters [req-1fe146c1-fff5-417c-9254-eddb9cfe0b64 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-09-06 08:49:00.653 10289 DEBUG nova.scheduler.filters.ram_filter [req-1fe146c1-fff5-417c-9254-eddb9cfe0b64 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] (director.redhat.com, 78425e0b-291c-441a-9a03-a13a5aa7989d) ram: 0MB disk: 0MB io_ops: 0 instances: 0 does not have 1024 MB usable ram before overcommit, it only has 0 MB. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:45
2016-09-06 08:49:00.654 10289 INFO nova.filters [req-1fe146c1-fff5-417c-9254-eddb9cfe0b64 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter RamFilter returned 0 hosts


Filter DiskFilter returned 0 hosts

2016-09-06 08:49:17.383 10289 DEBUG nova.filters [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter ComputeCapabilitiesFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-09-06 08:49:17.384 10289 DEBUG nova.filters [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter AvailabilityZoneFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-09-06 08:49:17.384 10289 DEBUG nova.scheduler.filters.ram_filter [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] (director.redhat.com, 357c8d25-80c9-4737-b6d8-795b324692d1) ram: 0MB disk: 0MB io_ops: 0 instances: 0 does not have 1024 MB usable ram before overcommit, it only has 0 MB. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:45
2016-09-06 08:49:17.384 10289 DEBUG nova.filters [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter RamFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-09-06 08:49:17.384 10289 DEBUG nova.scheduler.filters.disk_filter [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] (director.redhat.com, cd15ccb0-0a98-4a76-aa60-10a1e8c9d342) ram: 1024MB disk: 5120MB io_ops: 0 instances: 0 does not have 10240 MB usable disk, it only has 5120.0 MB usable disk. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/disk_filter.py:55
2016-09-06 08:49:17.385 10289 INFO nova.filters [req-f579efd0-a4a7-4dd7-9a81-7947f109d9e4 3575d153aa834960801d45d83bd750a7 31236b7142a945d19826d28e8ec778d7 - - -] Filter DiskFilter returned 0 hosts

Comment 5 Jun Hu 2016-09-06 09:34:06 UTC
info is within http://pastebin.com/3CdaX1y4

Comment 6 Dmitry Tantsur 2016-09-06 14:47:45 UTC
*** Bug 1371246 has been marked as a duplicate of this bug. ***

Comment 7 Lucas Alvares Gomes 2016-09-06 14:56:00 UTC
Hi,

(In reply to Jun Hu from comment #5)
> info is within http://pastebin.com/3CdaX1y4

Looking at the information in the link it doesn't seem related to this bug because in your case the nodes was pick by the scheduler already and it failed later. You see the "No valid host found" because of the retry filter running out of nodes to retry again.

Comment 8 Lucas Alvares Gomes 2016-09-06 14:57:30 UTC
@Sindhur,

Your nodes looks fine, is it possible to you to post the flavors (nova flavor-show) and the output of "nova hypervisor-stat" as well ?

Comment 9 Jun Hu 2016-09-06 23:36:26 UTC
Created attachment 1198444 [details]
nova scheduler log

Comment 10 Jun Hu 2016-09-06 23:38:28 UTC
(In reply to Lucas Alvares Gomes from comment #7)
> Hi,
> 
> (In reply to Jun Hu from comment #5)
> > info is within http://pastebin.com/3CdaX1y4
> 
> Looking at the information in the link it doesn't seem related to this bug
> because in your case the nodes was pick by the scheduler already and it
> failed later. You see the "No valid host found" because of the retry filter
> running out of nodes to retry again.

if you're right, why failed? 
I have uploaded nova scheduler log, you can check it.

Comment 11 Lucas Alvares Gomes 2016-09-07 08:40:30 UTC
(In reply to Jun Hu from comment #10)
> (In reply to Lucas Alvares Gomes from comment #7)
> > Hi,
> > 
> > (In reply to Jun Hu from comment #5)
> > > info is within http://pastebin.com/3CdaX1y4
> > 
> > Looking at the information in the link it doesn't seem related to this bug
> > because in your case the nodes was pick by the scheduler already and it
> > failed later. You see the "No valid host found" because of the retry filter
> > running out of nodes to retry again.
> 
> if you're right, why failed? 
> I have uploaded nova scheduler log, you can check it.

Hi Jun, 

Failures can vary, I will need some more logs to actually be able to tell you what happened and confirm whether it failed to be scheduled or at part of the deployment.

Can you please provide the logs for the nova-compute and ironic-conductor services and the output from the following commands:

* ironic node-show <node uuid>
* nova flavor-show <flavor id>
* nova hypervisor-stat

Comment 12 Jun Hu 2016-09-07 10:27:13 UTC
Created attachment 1198631 [details]
info

Comment 13 Jun Hu 2016-09-07 10:30:54 UTC
Created attachment 1198632 [details]
latest nova-scheduler.log

Comment 14 Sai Sindhur Malleni 2016-09-08 17:07:00 UTC
Lucas, nova hypervior-stats on the undercloud?
stack@manager ~]$ nova hypervisor-stats
+----------------------+-------+
| Property             | Value |
+----------------------+-------+
| count                | 3     |
| current_workload     | 0     |
| disk_available_least | -405  |
| free_disk_gb         | 0     |
| free_ram_mb          | 0     |
| local_gb             | 0     |
| local_gb_used        | 120   |
| memory_mb            | 0     |
| memory_mb_used       | 12288 |
| running_vms          | 3     |
| vcpus                | 0     |
| vcpus_used           | 3     |
+----------------------+-------+
[stack@manager ~]$ nova flavor-show control
+----------------------------+--------------------------------------------------------------------------+
| Property                   | Value                                                                    |
+----------------------------+--------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                        |
| disk                       | 40                                                                       |
| extra_specs                | {"capabilities:boot_option": "local", "capabilities:profile": "control"} |
| id                         | 5f8d78d4-0999-4b97-9423-2127eb801d9e                                     |
| name                       | control                                                                  |
| os-flavor-access:is_public | True                                                                     |
| ram                        | 4096                                                                     |
| rxtx_factor                | 1.0                                                                      |
| swap                       |                                                                          |
| vcpus                      | 1                                                                        |
+----------------------------+--------------------------------------------------------------------------+
[stack@manager ~]$ nova flavor-show compute
+----------------------------+--------------------------------------------------------------------------+
| Property                   | Value                                                                    |
+----------------------------+--------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                    |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                        |
| disk                       | 40                                                                       |
| extra_specs                | {"capabilities:boot_option": "local", "capabilities:profile": "compute"} |
| id                         | db42a488-efd1-4e05-b405-2e48fa1bd78e                                     |
| name                       | compute                                                                  |
| os-flavor-access:is_public | True                                                                     |
| ram                        | 4096                                                                     |
| rxtx_factor                | 1.0                                                                      |
| swap                       |                                                                          |
| vcpus                      | 1                                                                        |
+----------------------------+--------------------------------------------------------------------------+

Laso, I managed to get a successful deployment not passing any flavors, previously I tagged flavor for control but not cumpute. Should I not tag flavor only for one node type and do it for all? My deploy succeeded when I didnt tag either the control nodes or computes to a flavor.

Comment 15 Lucas Alvares Gomes 2016-09-09 09:16:38 UTC
(In reply to Sindhur from comment #14)
> Lucas, nova hypervior-stats on the undercloud?
> stack@manager ~]$ nova hypervisor-stats
> +----------------------+-------+
> | Property             | Value |
> +----------------------+-------+
> | count                | 3     |
> | current_workload     | 0     |
> | disk_available_least | -405  |
> | free_disk_gb         | 0     |
> | free_ram_mb          | 0     |
> | local_gb             | 0     |
> | local_gb_used        | 120   |
> | memory_mb            | 0     |
> | memory_mb_used       | 12288 |
> | running_vms          | 3     |
> | vcpus                | 0     |
> | vcpus_used           | 3     |
> +----------------------+-------+
> [stack@manager ~]$ nova flavor-show control
> +----------------------------+-----------------------------------------------
> ---------------------------+
> | Property                   | Value                                        
> |
> +----------------------------+-----------------------------------------------
> ---------------------------+
> | OS-FLV-DISABLED:disabled   | False                                        
> |
> | OS-FLV-EXT-DATA:ephemeral  | 0                                            
> |
> | disk                       | 40                                           
> |
> | extra_specs                | {"capabilities:boot_option": "local",
> "capabilities:profile": "control"} |
> | id                         | 5f8d78d4-0999-4b97-9423-2127eb801d9e         
> |
> | name                       | control                                      
> |
> | os-flavor-access:is_public | True                                         
> |
> | ram                        | 4096                                         
> |
> | rxtx_factor                | 1.0                                          
> |
> | swap                       |                                              
> |
> | vcpus                      | 1                                            
> |
> +----------------------------+-----------------------------------------------
> ---------------------------+
> [stack@manager ~]$ nova flavor-show compute
> +----------------------------+-----------------------------------------------
> ---------------------------+
> | Property                   | Value                                        
> |
> +----------------------------+-----------------------------------------------
> ---------------------------+
> | OS-FLV-DISABLED:disabled   | False                                        
> |
> | OS-FLV-EXT-DATA:ephemeral  | 0                                            
> |
> | disk                       | 40                                           
> |
> | extra_specs                | {"capabilities:boot_option": "local",
> "capabilities:profile": "compute"} |
> | id                         | db42a488-efd1-4e05-b405-2e48fa1bd78e         
> |
> | name                       | compute                                      
> |
> | os-flavor-access:is_public | True                                         
> |
> | ram                        | 4096                                         
> |
> | rxtx_factor                | 1.0                                          
> |
> | swap                       |                                              
> |
> | vcpus                      | 1                                            
> |
> +----------------------------+-----------------------------------------------
> ---------------------------+
> 
> Laso, I managed to get a successful deployment not passing any flavors,
> previously I tagged flavor for control but not cumpute. Should I not tag
> flavor only for one node type and do it for all? My deploy succeeded when I
> didnt tag either the control nodes or computes to a flavor.

The information looks fine (tho the hypervisor-stats there is from an already deployed cloud, right?).

By "tagging a flavor" you mean passing --control-flavor or --compute-flavor to the deploy command ?

I'm not sure about the particularities of this command, if passing such flags prevent nodes from being scheduled it does sounds like a bug in this CLI and not Ironic itself. In any case, if that affirmation is true let's reassign this bug to the python-tripleoclient component and the folks responsible for the tool will be able to advice you what to do better than I can.

Comment 16 Sai Sindhur Malleni 2016-09-09 18:46:30 UTC
Lucas, Correct the cloud later deployed.

Comment 17 Lucas Alvares Gomes 2016-09-12 12:33:48 UTC
(In reply to Sindhur from comment #16)
> Lucas, Correct the cloud later deployed.

Cool, I'm re-assigning the bug to the python-tripleoclient so they can investigate what this command is doing which is preventing the nodes from being scheduled correctly.

Comment 18 James Slagle 2016-10-18 20:09:59 UTC
tripleoclient does not do anything with RamFilter / DiskFilter. These were the filters that returned 0 hosts, not the ComputeCapabilitiesFilter, which is what does the profile matching.

Comment 19 James Slagle 2016-10-18 20:11:25 UTC
Lucas, could you have another look at this one? Is the scheduler log not accurate in this case?

Comment 20 Dmitry Tantsur 2016-10-20 11:48:15 UTC
> Should I not tag flavor only for one node type and do it for all?

This is a dangerous thing to do, lemme explain why. As the compute profile is not limited to any set of nodes, it can be scheduled to nodes that are tagged for control. I think this is the cause of your problem, and I believe this bug can be closed.

Jun Hu, please move your logs to https://bugzilla.redhat.com/show_bug.cgi?id=1375958 which you created.