Bug 1461242 - Running nova-manage per the instructions for ironic in the overcloud isn't working as expected.
Running nova-manage per the instructions for ironic in the overcloud isn't wo...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
urgent Severity urgent
: z2
: 11.0 (Ocata)
Assigned To: Dmitry Tantsur
Dan Yasny
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-13 23:09 EDT by Darin Sorrentino
Modified: 2017-09-13 17:43 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-6.2.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-13 17:43:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 477084 None None None 2017-06-26 11:20 EDT
Red Hat Product Errata RHBA-2017:2721 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 director Bug Fix Advisory 2017-09-13 21:39:22 EDT

  None (edit)
Description Darin Sorrentino 2017-06-13 23:09:20 EDT
Description of problem:

Per the draft instructions, after adding a baremetal node to the overcloud, we run:

nova-manage cell_v2 discover_hosts --verbose

When I did this, the result output was:

[root@overcloud-controller-0 ~]#  nova-manage cell_v2 discover_hosts --verbose
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': 7e21880c-eb3a-45d6-be15-0d1ea7de678a
Found 4 computes in cell: 7e21880c-eb3a-45d6-be15-0d1ea7de678a
Checking host mapping for compute host 'overcloud-compute-2.localdomain': 98afa87e-0637-43e8-9ccd-712635590ea1
Checking host mapping for compute host 'overcloud-compute-1.localdomain': aa36c183-3f21-4360-a0f4-67510622babb
Checking host mapping for compute host 'overcloud-compute-0.localdomain': ad5b8254-b7a8-4248-85b5-50982b42b94e
Checking host mapping for compute host 'overcloud-controller-1.localdomain': cb74ef3e-3990-4d29-9176-bfd53382dc95
Creating host mapping for compute host 'overcloud-controller-1.localdomain': cb74ef3e-3990-4d29-9176-bfd53382dc95
[root@overcloud-controller-0 ~]#

The thing is, this was an overcloud deployed as an HA deployment, so there's 2 other controllers that aren't being seen.

Deleted the baremetal node from the overcloud and re-added it, and this time it was mapped to controller-2:

[stack@director-vm ~]$ openstack hypervisor list
+----+--------------------------------------+-----------------+---------------+-------+
| ID | Hypervisor Hostname                  | Hypervisor Type | Host IP       | State |
+----+--------------------------------------+-----------------+---------------+-------+
|  2 | overcloud-compute-2.localdomain      | QEMU            | 10.224.52.214 | up    |
|  5 | overcloud-compute-1.localdomain      | QEMU            | 10.224.52.202 | up    |
|  8 | overcloud-compute-0.localdomain      | QEMU            | 10.224.52.213 | up    |
| 32 | 171f1e5e-c15e-48bd-801c-e2089bab8dc5 | ironic          | 10.224.50.16  | up    |
+----+--------------------------------------+-----------------+---------------+-------+
[stack@director-vm ~]$ openstack hypervisor show 171f1e5e-c15e-48bd-801c-e2089bab8dc5
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| aggregates           | [u'baremetal-hosts']                 |
| cpu_info             |                                      |
| current_workload     | 0                                    |
| disk_available_least | 13030                                |
| free_disk_gb         | 13030                                |
| free_ram_mb          | 262144                               |
| host_ip              | 10.224.50.16                         |
| hypervisor_hostname  | 171f1e5e-c15e-48bd-801c-e2089bab8dc5 |
| hypervisor_type      | ironic                               |
| hypervisor_version   | 1                                    |
| id                   | 32                                   |
| local_gb             | 13030                                |
| local_gb_used        | 0                                    |
| memory_mb            | 262144                               |
| memory_mb_used       | 0                                    |
| running_vms          | 0                                    |
| service_host         | overcloud-controller-2.localdomain   |
| service_id           | 203                                  |
| state                | up                                   |
| status               | enabled                              |
| vcpus                | 40                                   |
| vcpus_used           | 0                                    |
+----------------------+--------------------------------------+


Attempt to deploy results in:

Host 'overcloud-controller-2.localdomain' is not mapped to any cell

Re-executing the nova-manage this time seems to have removed controller-1 and added controller-2:

[heat-admin@overcloud-controller-0 ~]$ sudo nova-manage cell_v2 discover_hosts --verbose
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': 7e21880c-eb3a-45d6-be15-0d1ea7de678a
Found 4 computes in cell: 7e21880c-eb3a-45d6-be15-0d1ea7de678a
Checking host mapping for compute host 'overcloud-compute-2.localdomain': 98afa87e-0637-43e8-9ccd-712635590ea1
Checking host mapping for compute host 'overcloud-compute-1.localdomain': aa36c183-3f21-4360-a0f4-67510622babb
Checking host mapping for compute host 'overcloud-compute-0.localdomain': ad5b8254-b7a8-4248-85b5-50982b42b94e
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 7245b7e8-f45c-474e-b8cc-5530cfc5914a
Creating host mapping for compute host 'overcloud-controller-2.localdomain': 7245b7e8-f45c-474e-b8cc-5530cfc5914a
[heat-admin@overcloud-controller-0 ~]$

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

Only a single controller is being added when doing the discover_hosts.

Expected results:

All controllers are added when doing a discover_hosts.

Additional info:
Comment 1 Dmitry Tantsur 2017-06-14 05:05:04 EDT
Moving to THT, as this has nothing to do with ironic itself.
Comment 2 Darin Sorrentino 2017-06-14 16:08:11 EDT
I tried and I have been unable to reproduce this in the lab.  In the lab it seems I am able to deploy to the baremetal node regardless of which controller it gets assigned to without the need to re-run the nova-manage command.  The issue was repeatable and consistent at the customer site.  I am not sure what the difference is between the lab and the customer site as both were deployed from CDN.
Comment 3 Dmitry Tantsur 2017-06-26 11:20:42 EDT
The master change has landed, and a backport is proposed.
Comment 6 Dan Yasny 2017-09-07 20:14:22 EDT
Installed OSP11 today:

- NovaSchedulerDiscoverHostsInCellsInterval isn't present in /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml

moreover:
[stack@undercloud-0 ~]$ . overcloudrc
[stack@undercloud-0 ~]$ nova-manage cell_v2 discover_hosts --verbose
Traceback (most recent call last):
  File "/usr/bin/nova-manage", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1577, in main
    config.parse_args(sys.argv)
  File "/usr/lib/python2.7/site-packages/nova/config.py", line 52, in parse_args
    default_config_files=default_config_files)
  File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2359, in __call__
    self._namespace._files_permission_denied)
oslo_config.cfg.ConfigFilesPermissionDeniedError: Failed to open some config files: /usr/share/nova/nova-dist.conf,/etc/nova/nova.conf


@Dmitry does this merit a new BZ?
Comment 8 Darin Sorrentino 2017-09-07 20:29:14 EDT
(In reply to Dan Yasny from comment #6)
> Installed OSP11 today:
> 
> - NovaSchedulerDiscoverHostsInCellsInterval isn't present in
> /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
> 
> moreover:
> [stack@undercloud-0 ~]$ . overcloudrc
> [stack@undercloud-0 ~]$ nova-manage cell_v2 discover_hosts --verbose
> Traceback (most recent call last):
>   File "/usr/bin/nova-manage", line 10, in <module>
>     sys.exit(main())
>   File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1577, in
> main
>     config.parse_args(sys.argv)
>   File "/usr/lib/python2.7/site-packages/nova/config.py", line 52, in
> parse_args
>     default_config_files=default_config_files)
>   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2359, in
> __call__
>     self._namespace._files_permission_denied)
> oslo_config.cfg.ConfigFilesPermissionDeniedError: Failed to open some config
> files: /usr/share/nova/nova-dist.conf,/etc/nova/nova.conf
> 
> 
> @Dmitry does this merit a new BZ?


The nova-manage command you are trying to run needs to be run as root on one of the overcloud controllers.  The NovaSchedulerDiscoverHostsInCellsInterval that you reference is actually the discover_hosts_in_cells_interval setting in /etc/nova/nova.conf on the controller nodes which appears to be able to be set with the following YAML:

parameter_defaults:
  ControllerExtraConfig:
    nova::scheduler::discover_hosts_in_cells_interval: 15
Comment 9 Dan Yasny 2017-09-07 21:01:03 EDT
(In reply to Darin Sorrentino from comment #8)
> The nova-manage command you are trying to run needs to be run as root on one
> of the overcloud controllers.  

Right, then that's not a bug, the command seems to be working on the controllers.

This is the output on the controllers:

[stack@undercloud-0 ~]$ for i in 7 15 16; do ssh -o StrictHostKeyChecking=no heat-admin@192.168.24.$i "sudo nova-manage cell_v2 discover_hosts --verbose"; done

Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': a21c465a-d074-4d97-9e98-15136652e586
Found 4 computes in cell: a21c465a-d074-4d97-9e98-15136652e586
Checking host mapping for compute host 'overcloud-compute-1.localdomain': 469cceda-1dda-4578-b93d-d90d2f77c427
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 7d27ad38-a677-49cb-ae9c-6932ddf4d869
Checking host mapping for compute host 'overcloud-controller-1.localdomain': a52f6fd4-8a24-483a-9ef5-45e75d537908
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 3e587a13-9705-4fff-81b9-176eaad68676

Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': a21c465a-d074-4d97-9e98-15136652e586
Found 4 computes in cell: a21c465a-d074-4d97-9e98-15136652e586
Checking host mapping for compute host 'overcloud-compute-1.localdomain': 469cceda-1dda-4578-b93d-d90d2f77c427
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 7d27ad38-a677-49cb-ae9c-6932ddf4d869
Checking host mapping for compute host 'overcloud-controller-1.localdomain': a52f6fd4-8a24-483a-9ef5-45e75d537908
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 3e587a13-9705-4fff-81b9-176eaad68676

Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': a21c465a-d074-4d97-9e98-15136652e586
Found 4 computes in cell: a21c465a-d074-4d97-9e98-15136652e586
Checking host mapping for compute host 'overcloud-compute-1.localdomain': 469cceda-1dda-4578-b93d-d90d2f77c427
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 7d27ad38-a677-49cb-ae9c-6932ddf4d869
Checking host mapping for compute host 'overcloud-controller-1.localdomain': a52f6fd4-8a24-483a-9ef5-45e75d537908
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 3e587a13-9705-4fff-81b9-176eaad68676


The output is consistent but it is missing the controller-0


+----+--------------------------------------+-----------------+---------------+-------+
[stack@undercloud-0 ~]$ openstack hypervisor show 60044a7a-fe67-42b7-ae9c-2930cd0412c8
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| aggregates           | [u'baremetal-hosts']                 |
| cpu_info             |                                      |
| current_workload     | 0                                    |
| disk_available_least | -40                                  |
| free_disk_gb         | 0                                    |
| free_ram_mb          | 0                                    |
| host_ip              | 192.168.24.16                        |
| hypervisor_hostname  | 60044a7a-fe67-42b7-ae9c-2930cd0412c8 |
| hypervisor_type      | ironic                               |
| hypervisor_version   | 1                                    |
| id                   | 32                                   |
| local_gb             | 0                                    |
| local_gb_used        | 40                                   |
| memory_mb            | 0                                    |
| memory_mb_used       | 1024                                 |
| running_vms          | 1                                    |
| service_host         | overcloud-controller-2.localdomain   |
| service_id           | 107                                  |
| state                | down                                 |
| status               | enabled                              |
| vcpus                | 0                                    |
| vcpus_used           | 1                                    |
+----------------------+--------------------------------------+


I am not sure whether this reproduces the bug at this point


> The NovaSchedulerDiscoverHostsInCellsInterval
> that you reference is actually the discover_hosts_in_cells_interval setting
> in /etc/nova/nova.conf on the controller nodes which appears to be able to
> be set with the following YAML:
> 
> parameter_defaults:
>   ControllerExtraConfig:
>     nova::scheduler::discover_hosts_in_cells_interval: 15

According to the patch at https://review.openstack.org/477084 NovaSchedulerDiscoverHostsInCellsInterval should be in the THT ironic.yaml, and it is not. I suspect the patch made it only into OSP12 (BZ there has been verified already).
Comment 10 Dmitry Tantsur 2017-09-08 04:58:51 EDT
We should NOT require customers to run nova-manage each time they enroll nodes. The patch you mention is in stable/ocata, so it has to hit OSP 11. Maybe it was moved to MODIFIED prematurely.. Steve?
Comment 13 Dan Yasny 2017-09-11 11:13:11 EDT
[stack@undercloud-0 ~]$ . overcloudrc
[stack@undercloud-0 ~]$ ironic node-list
+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name     | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+
| 5f344542-73fc-453c-bc30-8d9d469ac699 | ironic-0 | 38c3ae19-172f-4b42-beaa-f7842b184597 | power on    | active             | False       |
| 4f6e2dd7-6b21-43aa-92aa-8308c1f367d3 | ironic-1 | b790d98d-a099-4013-9bcc-e809701bbe2b | power on    | active             | False       |
+--------------------------------------+----------+--------------------------------------+-------------+--------------------+-------------+


[stack@undercloud-0 ~]$ for i in 20 16 6; do ssh -o StrictHostKeyChecking=no heat-admin@192.168.24.$i "sudo nova-manage cell_v2 discover_hosts --verbose"; done
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Found 4 computes in cell: 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 5afceb60-d960-4285-abac-2ba3c50d7fcf
Checking host mapping for compute host 'overcloud-compute-1.localdomain': f0faff5f-51b7-4167-9b09-14b39492aa99
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 83bfbd24-8982-4e14-859f-7a7cff707d7d
Checking host mapping for compute host 'overcloud-controller-2.localdomain': c0740333-bddd-4ca0-a809-d0aaa7665f27
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Found 4 computes in cell: 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 5afceb60-d960-4285-abac-2ba3c50d7fcf
Checking host mapping for compute host 'overcloud-compute-1.localdomain': f0faff5f-51b7-4167-9b09-14b39492aa99
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 83bfbd24-8982-4e14-859f-7a7cff707d7d
Checking host mapping for compute host 'overcloud-controller-2.localdomain': c0740333-bddd-4ca0-a809-d0aaa7665f27
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting compute nodes from cell 'default': 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Found 4 computes in cell: 29c44fd1-6c82-4cb7-8cf5-b7118c3d2180
Checking host mapping for compute host 'overcloud-compute-0.localdomain': 5afceb60-d960-4285-abac-2ba3c50d7fcf
Checking host mapping for compute host 'overcloud-compute-1.localdomain': f0faff5f-51b7-4167-9b09-14b39492aa99
Checking host mapping for compute host 'overcloud-controller-2.localdomain': 83bfbd24-8982-4e14-859f-7a7cff707d7d
Checking host mapping for compute host 'overcloud-controller-2.localdomain': c0740333-bddd-4ca0-a809-d0aaa7665f27


[stack@undercloud-0 ~]$ openstack hypervisor list
+----+--------------------------------------+-----------------+---------------+-------+
| ID | Hypervisor Hostname                  | Hypervisor Type | Host IP       | State |
+----+--------------------------------------+-----------------+---------------+-------+
|  2 | overcloud-compute-0.localdomain      | QEMU            | 192.168.24.11 | up    |
|  5 | overcloud-compute-1.localdomain      | QEMU            | 192.168.24.9  | up    |
| 11 | 4f6e2dd7-6b21-43aa-92aa-8308c1f367d3 | ironic          | 192.168.24.6  | up    |
| 17 | 5f344542-73fc-453c-bc30-8d9d469ac699 | ironic          | 192.168.24.6  | up    |
+----+--------------------------------------+-----------------+---------------+-------+
[stack@undercloud-0 ~]$ openstack hypervisor show 4f6e2dd7-6b21-43aa-92aa-8308c1f367d3
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| aggregates           | [u'baremetal-hosts']                 |
| cpu_info             |                                      |
| current_workload     | 0                                    |
| disk_available_least | -40                                  |
| free_disk_gb         | -20                                  |
| free_ram_mb          | -1024                                |
| host_ip              | 192.168.24.6                         |
| hypervisor_hostname  | 4f6e2dd7-6b21-43aa-92aa-8308c1f367d3 |
| hypervisor_type      | ironic                               |
| hypervisor_version   | 1                                    |
| id                   | 11                                   |
| local_gb             | 0                                    |
| local_gb_used        | 40                                   |
| memory_mb            | 0                                    |
| memory_mb_used       | 1024                                 |
| running_vms          | 1                                    |
| service_host         | overcloud-controller-2.localdomain   |
| service_id           | 110                                  |
| state                | up                                   |
| status               | enabled                              |
| vcpus                | 0                                    |
| vcpus_used           | 1                                    |
+----------------------+--------------------------------------+
[stack@undercloud-0 ~]$ openstack hypervisor show 5f344542-73fc-453c-bc30-8d9d469ac699
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| aggregates           | [u'baremetal-hosts']                 |
| cpu_info             |                                      |
| current_workload     | 0                                    |
| disk_available_least | -40                                  |
| free_disk_gb         | -20                                  |
| free_ram_mb          | -1024                                |
| host_ip              | 192.168.24.6                         |
| hypervisor_hostname  | 5f344542-73fc-453c-bc30-8d9d469ac699 |
| hypervisor_type      | ironic                               |
| hypervisor_version   | 1                                    |
| id                   | 17                                   |
| local_gb             | 0                                    |
| local_gb_used        | 40                                   |
| memory_mb            | 0                                    |
| memory_mb_used       | 1024                                 |
| running_vms          | 1                                    |
| service_host         | overcloud-controller-2.localdomain   |
| service_id           | 110                                  |
| state                | up                                   |
| status               | enabled                              |
| vcpus                | 0                                    |
| vcpus_used           | 1                                    |
+----------------------+--------------------------------------+
Comment 20 errata-xmlrpc 2017-09-13 17:43:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2721

Note You need to log in before you can comment on or make changes to this bug.