Description of problem: Unable to add host to aggregate with the following error - compute host could not be found Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.2.0 Beta (Train) Puddle: RHOS-16.2-RHEL-8-20210409.n.0 When trying to add compute host to the aggregate it fails with the "compute host could not be found error" This happens because compute (hypervisor) name seen with different suffix by the controller. List the hypervisor hosts: $ openstack hypervisor list +--------------------------------------+---------------------------------+-----------------+---------------+-------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | +--------------------------------------+---------------------------------+-----------------+---------------+-------+ | f26f4523-64b7-4b51-8cab-6cd9de7d0410 | computeovsdpdksriov-1.novalocal | QEMU | 10.10.130.154 | up | | 74113e94-0e67-45b6-937f-b0ba21ea61f4 | computeovsdpdksriov-0.novalocal | QEMU | 10.10.130.187 | up | +--------------------------------------+---------------------------------+-----------------+---------------+-------+ List and show created aggregate: $ openstack aggregate list +----+------------------------------+-------------------+ | ID | Name | Availability Zone | +----+------------------------------+-------------------+ | 14 | tempest-aggregate-1965427792 | None | +----+------------------------------+-------------------+ $ openstack aggregate show 14 +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | availability_zone | None | | created_at | 2021-04-14T06:17:35.000000 | | deleted | False | | deleted_at | None | | hosts | | | id | 14 | | name | tempest-aggregate-1965427792 | | properties | | | updated_at | None | | uuid | c008a013-6a12-489a-8ec8-8efdc340c5c0 | +-------------------+--------------------------------------+ Try to add the host to aggregate: $ openstack aggregate add host 14 computeovsdpdksriov-1.novalocal Compute host computeovsdpdksriov-1.novalocal could not be found. (HTTP 404) (Request-ID: req-452f1917-fede-4c0e-8023-17b3f26adb98) Look into the nova-scheduler log of controller: 2021-04-14 06:24:40.240 15 DEBUG nova.scheduler.host_manager [req-2c3782a0-00e4-4acd-a267-83f629be473e - - - - -] Successfully synced instances from host 'computeovsdpdksriov-1.localdomain'. sync_instance_info /usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py:960 The host seen by the controller with suffix "computeovsdpdksriov-1.localdomain" and not "computeovsdpdksriov-1.novalocal" as seen in the hypervisor list output. Try to add the host with the "localdomain" suffix: $ openstack aggregate add host 14 computeovsdpdksriov-1.localdomain +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | availability_zone | None | | created_at | 2021-04-14T06:17:35.000000 | | deleted | False | | deleted_at | None | | hosts | computeovsdpdksriov-1.localdomain | | id | 14 | | name | tempest-aggregate-1965427792 | | properties | | | updated_at | None | | uuid | c008a013-6a12-489a-8ec8-8efdc340c5c0 | +-------------------+--------------------------------------+ The host successfully added. The output suffix in "openstack hypervisor list" command is not correct.
Sosreports available on the following link: http://file.mad.redhat.com/~mbabushk/sosreports/bz1949385/
you are trying to use the hypervior hostname to add a host to an aggregate that is incorrect. each node has 2 different value the hypervior hostname which the hostname as reported by libvirt/glibc and the hosts value which is set in the nova.conf. which is set to host=computeovsdpdksriov-0.localdomain this is not a nova bug its a ooo bug combinded with fact that you were trying to use they hypervior host name instead of the compute service host. (kolla-venv) [sean@workstation kolla-work-dir]$ openstack aggregate create test +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2021-04-14T11:31:24.952624 | | deleted | False | | deleted_at | None | | hosts | None | | id | 1 | | name | test | | properties | None | | updated_at | None | +-------------------+----------------------------+ (kolla-venv) [sean@workstation kolla-work-dir]$ openstack aggregate show test +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2021-04-14T11:31:24.000000 | | deleted | False | | deleted_at | None | | hosts | | | id | 1 | | name | test | | properties | | | updated_at | None | +-------------------+----------------------------+ [sean@workstation kolla-work-dir]$ openstack compute service list --service nova-compute +----+--------------+--------------------+------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +----+--------------+--------------------+------+---------+-------+----------------------------+ | 4 | nova-compute | workstation | nova | enabled | up | 2021-04-14T11:23:24.000000 | | 5 | nova-compute | workstation-ironic | nova | enabled | up | 2021-04-14T11:23:24.000000 | +----+--------------+--------------------+------+---------+-------+----------------------------+ openstack hypervisor list --long +----+--------------------------------------+-----------------+-------------+-------+------------+-------+----------------+-----------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | vCPUs Used | vCPUs | Memory MB Used | Memory MB | +----+--------------------------------------+-----------------+-------------+-------+------------+-------+----------------+-----------+ | 1 | workstation | QEMU | 192.168.3.1 | up | 162 | 48 | 202240 | 257886 | | 2 | 31303735-3035-4247-3830-333132534457 | ironic | 192.168.3.1 | up | 16 | 0 | 49152 | 0 | | 3 | 31303735-3934-4247-3830-333132535336 | ironic | 192.168.3.1 | up | 16 | 0 | 49152 | 0 | | 4 | 31303735-3035-4247-3830-323630455630 | ironic | 192.168.3.1 | up | 0 | 0 | 0 | 0 | | 5 | 31303735-3934-4247-3830-323630455930 | ironic | 192.168.3.1 | up | 16 | 0 | 49152 | 0 | +----+--------------------------------------+-----------------+-------------+-------+------------+-------+----------------+-----------+ this should fail [sean@workstation kolla-work-dir]$ openstack aggregate add host test 31303735-3035-4247-3830-333132534457 Compute host 31303735-3035-4247-3830-333132534457 could not be found. (HTTP 404) (Request-ID: req-d68bfe69-f26d-4511-a8c4-faa17d959da7) and it does but using the host value form the compute service record works openstack aggregate add host test workstation-ironic +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2021-04-14T11:31:24.000000 | | deleted | False | | deleted_at | None | | hosts | workstation-ironic | | id | 1 | | name | test | | properties | | | updated_at | None | +-------------------+----------------------------+ this is how the aggregates api was intended to work
Hi Sean, We are using the same tht to deploy an environment on 16.1 and 16.2. We never used the "CloudDomain" parameter in our tht. So why in that case I have different output in 16.1 and 16.2? In 16.1: $ openstack hypervisor list +--------------------------------------+-----------------------------------+-----------------+---------------+-------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | +--------------------------------------+-----------------------------------+-----------------+---------------+-------+ | 3c4ebf0e-3e2a-486b-b119-2b8924769c17 | computeovsdpdksriov-1.localdomain | QEMU | 10.10.100.111 | up | | 216357a2-d499-4aaf-80b0-37cbb3eb4df0 | computeovsdpdksriov-0.localdomain | QEMU | 10.10.100.150 | up | +--------------------------------------+-----------------------------------+-----------------+---------------+-------+ In 16.2: $ openstack hypervisor list +--------------------------------------+---------------------------------+-----------------+---------------+-------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | +--------------------------------------+---------------------------------+-----------------+---------------+-------+ | f26f4523-64b7-4b51-8cab-6cd9de7d0410 | computeovsdpdksriov-1.novalocal | QEMU | 10.10.130.154 | up | | 74113e94-0e67-45b6-937f-b0ba21ea61f4 | computeovsdpdksriov-0.novalocal | QEMU | 10.10.130.187 | up | +--------------------------------------+---------------------------------+-----------------+---------------+-------+
*** Bug 1949469 has been marked as a duplicate of this bug. ***
there can be a number of reasons but the short answer is the is "Hypervisor Hostname" is the value that libvirt retruns to nova for the hostname of the current host. in 16.1 ooo was configuring the host such that the "hypervior hostname" and the value configured in nova.conf were the same. i suspect that something has change in how /etc/hostname and /etc/hosts is now being configured resulting in effectivly a hostname change. this is a release blocker as it will break deployment on upgrade so setting the correct flags. i have close the ohter bug reported for the OVN job as a duplciate of this so ill remove the nfv dfg form the devel dashboard since its not dpdk specific.
this could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1900500
/etc/hosts looks correct # BEGIN ANSIBLE MANAGED BLOCK 10.10.130.187 computeovsdpdksriov-0.localdomain computeovsdpdksriov-0 10.10.130.187 computeovsdpdksriov-0.internalapi.localdomain computeovsdpdksriov-0.internalapi 10.10.131.102 computeovsdpdksriov-0.tenant.localdomain computeovsdpdksriov-0.tenant 10.10.132.143 computeovsdpdksriov-0.storage.localdomain computeovsdpdksriov-0.storage 192.0.90.22 computeovsdpdksriov-0.ctlplane.localdomain computeovsdpdksriov-0.ctlplane 10.10.130.154 computeovsdpdksriov-1.localdomain computeovsdpdksriov-1 10.10.130.154 computeovsdpdksriov-1.internalapi.localdomain computeovsdpdksriov-1.internalapi 10.10.131.175 computeovsdpdksriov-1.tenant.localdomain computeovsdpdksriov-1.tenant 10.10.132.121 computeovsdpdksriov-1.storage.localdomain computeovsdpdksriov-1.storage 192.0.90.21 computeovsdpdksriov-1.ctlplane.localdomain computeovsdpdksriov-1.ctlplane 10.10.130.184 controller-0.localdomain controller-0 10.10.130.184 controller-0.internalapi.localdomain controller-0.internalapi 10.10.131.127 controller-0.tenant.localdomain controller-0.tenant 10.10.132.114 controller-0.storage.localdomain controller-0.storage 10.10.133.194 controller-0.storagemgmt.localdomain controller-0.storagemgmt 10.35.185.75 controller-0.external.localdomain controller-0.external 192.0.90.19 controller-0.ctlplane.localdomain controller-0.ctlplane 10.10.130.194 controller-1.localdomain controller-1 10.10.130.194 controller-1.internalapi.localdomain controller-1.internalapi 10.10.131.184 controller-1.tenant.localdomain controller-1.tenant 10.10.132.138 controller-1.storage.localdomain controller-1.storage 10.10.133.123 controller-1.storagemgmt.localdomain controller-1.storagemgmt 10.35.185.76 controller-1.external.localdomain controller-1.external 192.0.90.24 controller-1.ctlplane.localdomain controller-1.ctlplane 10.10.130.162 controller-2.localdomain controller-2 10.10.130.162 controller-2.internalapi.localdomain controller-2.internalapi 10.10.131.188 controller-2.tenant.localdomain controller-2.tenant 10.10.132.148 controller-2.storage.localdomain controller-2.storage 10.10.133.140 controller-2.storagemgmt.localdomain controller-2.storagemgmt 10.35.185.67 controller-2.external.localdomain controller-2.external 192.0.90.9 controller-2.ctlplane.localdomain controller-2.ctlplane 192.0.90.1 undercloud-0.ctlplane.localdomain undercloud-0.ctlplane 192.0.90.12 overcloud.ctlplane.localdomain 10.10.130.175 overcloud.internalapi.localdomain 10.10.132.157 overcloud.storage.localdomain 10.10.133.153 overcloud.storagemgmt.localdomain 10.35.185.74 overcloud.localdomain # END ANSIBLE MANAGED BLOCK 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 but /etc/hostname is wrong computeovsdpdksriov-0.novalocal something is causing ooo to generate the hostname incorrectly in 16.2
sosreport-undercloud-0-2021-04-14-wrfqjon]$ grep tripleo-heat installed-rpms openstack-tripleo-heat-templates-11.4.1-2.20210323012110.c3396e2.el8ost.1.noarch Tue Apr 13 13:23:07 2021 sosreport-undercloud-0-2021-04-14-wrfqjon]$ grep dhcp_domain var/lib/config-data/puppet-generated/nova//etc/nova/nova.conf #dhcp_domain=novalocal openstack-tripleo-heat-templates-11.4.1-2.20210323012110.c3396e2.el8ost.1.noarch misses [1] to unset the default novalocal dhcp_domain in nova.conf on the undercloud. It is part of openstack-tripleo-heat-templates-11.4.1-2.20210326005015.7befdd2.el8ost [1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/782684
as martin pointed out this is already fixed but was not included in the compose so setting it to modifed and adding triaged keyword.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483