Description of problem: after scaling in overcloud from 2 nodes to 1 I tried scaling it back out (with the same node I removed when I scaled in) and deployment failed with an error for not enough hosts available: {'code': 500, 'created': '2021-05-10T11:58:54Z', 'message': 'No valid host was found. There are not enough hosts available.', 'details': 'Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 1379, in schedule_and_build_instances\n instance_uuids, return_alternates=True)\n File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 839, in _schedule_instances\n return_alternates=return_alternates)\n File "/usr/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations\n instance_uuids, return_objects, return_alternates)\n File "/usr/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations\n return cctxt.call(ctxt, \'select_destinations\', **msg_args)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call\n transport_options=self.transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send\n transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send\n transport_options=transport_options)\n File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send\n raise result\nnova.exception_Remote.NoValidHost_Remote: No valid host was found. There are not enough hosts available.\nTraceback (most recent call last):\n\n File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 235, in inner\n return func(*args, **kwargs)\n\n File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 214, in select_destinations\n allocation_request_version, return_alternates)\n\n File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 96, in select_destinations\n allocation_request_version, return_alternates)\n\n File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 265, in _schedule\n claimed_instance_uuids)\n\n File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 302, in _ensure_sufficient_hosts\n raise exception.NoValidHost(reason=reason)\n\nnova.exception.NoValidHost: No valid host was found. There are not enough hosts available.\n\n'} Version-Release number of selected component (if applicable): 16.2 How reproducible: 100% Steps to Reproduce: 1. deploy an overcloud with 2 nodes. 2. remove overcloud nodes with the following steps (from https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/director_installation_and_usage/scaling-overcloud-nodes): a. source /home/stack/overcloudrc b. openstack compute service set <compute> --disable c. source /home/stack/stackrc d. openstack overcloud node delete --stack overcloud -y <compute> e. source /home/stack/overcloudrc f. openstack compute service delete <compute> g. for AGENT in \$(openstack network agent list --host <compute> -c ID -f value) ; do openstack network agent delete \$AGENT ; done ; h. openstack resource provider delete <compute uuid> 3. try to scale out using overcloud deploy and you will get an error (the steps in here https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/director_installation_and_usage/scaling-overcloud-nodes) Actual results: deployment fail due to "No valid host was found. There are not enough hosts available." Expected results: overcloud scales successfully Additional info:
This error indicates you don't have available hosts to deploy. If you are reusing the hosts, you need to make sure they have been cleaned and are available in Ironic again.
Docs are needed (I'm unable to set the flag)
It sounds like the undercloud ironic config automated_clean might be set to False in this case, or not all nodes are in an available state. However the documentation should make it clear there are prerequisites before attempting scale-up. I think section 16.2 needs some prerequisite bullet points just like in section 8.5[1] to ensure there are nodes available before attempting scale-up. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/director_installation_and_usage/index#scaling-up-bare-metal-nodes
It seems like the automated_clean was set to false. assuming I would like to cleanup manually what are the steps that I should do?
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/bare_metal_provisioning/configuring-the-bare-metal-provisioning-service-after-deployment#cleaning-nodes-manually_bare-metal-post-deployment
Hi, I tried with both manual and automatic cleanup but it still fails with the same error. steps: 1. deploy an overcloud with 2 compute nodes 2. followed steps in to scale down from https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/director_installation_and_usage/scaling-overcloud-nodes 3. openstack baremetal node clean c10ff1a6-11fb-4165-a86f-9153349e6c7f --clean-steps '[{"interface": "deploy", "step": "erase_devices"}]' / set the automated_clean flag to true and restart ironic containers 4. openstack baremetal node list (to make sure node status is available) 5. re-deployed the overcloud
Ella, Can you supply us with an sosreport from the undercloud where you've encountered this? An additional question, how much time was there between ensuring that the node status changed to available, and the attempt to re-deploy the overcloud? There is a reconciliation loop that only picks up changes every 2-3 minutes inside of nova if nova is in use for the deployment. If the deployment is nova-less, realistically you shouldn't be encountering this issue.
Hi Julia, Can you supply us with a sosreport from the undercloud where you've encountered this? unfortunately, there is an issue with Gerrit so I'll send sosreport as soon as I'll be able to redeploy. An additional question, how much time was there between ensuring that the node status changed to available, and the attempt to re-deploy the overcloud? The scale in and scale out are part of a larger automation so as soon as the step that removes the overcloud node is over it starts the deployment
Hi Ella, I am a member of the rhos-docs team. I have emailed one of the "Director Installation and Usage" guide writers and asked him to investigate this BZ. Thanks for your feedback! Best, --Greg
Adding the following sos report as per comment #7 http://rhos-release.virt.bos.redhat.com/log/bz1958940/ (In reply to Julia Kreger from comment #7) > Ella, > > Can you supply us with an sosreport from the undercloud where you've > encountered this? > > An additional question, how much time was there between ensuring that the > node status changed to available, and the attempt to re-deploy the overcloud? > > There is a reconciliation loop that only picks up changes every 2-3 minutes > inside of nova if nova is in use for the deployment. If the deployment is > nova-less, realistically you shouldn't be encountering this issue.
------------------------+(In reply to Steve Baker from comment #3) > It sounds like the undercloud ironic config automated_clean might be set to > False in this case, or not all nodes are in an available state. It could explain the issue here: so the node is in available state +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ 3afa3f02-8583-470f-a0a9-5c705b26b7ad | compute-1 | None | power off | available | False | but in the undercloud.conf we have this: clean_nodes = True so the data is lost, > > However the documentation should make it clear there are prerequisites > before attempting scale-up. > > I think section 16.2 needs some prerequisite bullet points just like in > section 8.5[1] to ensure there are nodes available before attempting > scale-up. > > [1] > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16. > 1/html-single/director_installation_and_usage/index#scaling-up-bare-metal- > nodes based on comment #3 it could explain the problem (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node show 39022793-d37d-4b31-8c37-b492b8a2a009 -c "instance_info" +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | instance_info | {'image_source': '4d6b3cc2-3bb2-4ff8-b091-e4f7a32ae9c1', 'root_gb': '97', 'swap_mb': '0', 'display_name': 'computehciovsdpdksriov-0', 'vcpus': '39', 'nova_host_id': 'undercloud-0.localdomain', 'memory_mb': '125000', 'local_gb': '1861', 'capabilities': '{"boot_option": "local", "profile": "compute"}', 'configdrive': '******'} | +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- I will try the following: * remove the node again from overcloud, * Introspect * tag * scale In case of success: will have to update the undercloud through the ironic, and retry the procedure
(In reply to Yariv from comment #14) > ------------------------+(In reply to Steve Baker from comment #3) > > It sounds like the undercloud ironic config automated_clean might be set to > > False in this case, or not all nodes are in an available state. > > It could explain the issue here: > so the node is in available state > > +--------------------------------------+--------------+---------------------- > ----------------+-------------+--------------------+-------------+ > | UUID | Name | Instance UUID > | Power State | Provisioning State | Maintenance | > +--------------------------------------+--------------+---------------------- > ----------------+-------------+--------------------+-------------+ > 3afa3f02-8583-470f-a0a9-5c705b26b7ad | compute-1 | None > | power off | available | False | > > but in the undercloud.conf we have this: > clean_nodes = True > > so the data is lost, > > > > > However the documentation should make it clear there are prerequisites > > before attempting scale-up. > > > > I think section 16.2 needs some prerequisite bullet points just like in > > section 8.5[1] to ensure there are nodes available before attempting > > scale-up. > > > > [1] > > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16. > > 1/html-single/director_installation_and_usage/index#scaling-up-bare-metal- > > nodes > > based on comment #3 it could explain the problem > > (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node show > 39022793-d37d-4b31-8c37-b492b8a2a009 -c "instance_info" > +---------------+------------------------------------------------------------ > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------------------+ > | Field | Value > | > +---------------+------------------------------------------------------------ > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------------------+ > | instance_info | {'image_source': '4d6b3cc2-3bb2-4ff8-b091-e4f7a32ae9c1', > 'root_gb': '97', 'swap_mb': '0', 'display_name': 'computehciovsdpdksriov-0', > 'vcpus': '39', 'nova_host_id': 'undercloud-0.localdomain', 'memory_mb': > '125000', 'local_gb': '1861', 'capabilities': '{"boot_option": "local", > "profile": "compute"}', 'configdrive': '******'} | > +---------------+------------------------------------------------------------ > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > ------------- > > > I will try the following: > * remove the node again from overcloud, > * Introspect > * tag > * scale > > In case of success: > will have to update the undercloud through the ironic, and retry the > procedure Did not have any success,
> > > > I will try the following: > > * remove the node again from overcloud, > > * Introspect > > * tag > > * scale > > > > In case of success: > > will have to update the undercloud through the ironic, and retry the > > procedure > > > Did not have any success, So to understand this correctly. You removed the node. You re-added. You re-inspected, tagged, and then attempted scale out? Any change in nova errors? Were there any prior errors and was the error just the last error? Can we get the baremetal node list. Are you just encountering issues with nova scheduling your scale out? Does `openstack baremetal node list` show sufficient baremetal machines available? What does `nova hypervisor-list` indicate? Without logs, it is impossible to confirm, but this seems like something is going sideways with the data in placement as if the node is in available state and properly configured matching the requested flavor of compute node, then it is all down to scheduling at that point.
From the nova-scheduler log the TripleOCapabilitiesFilter filters nodes and that results in the scheduling to fail: 2021-06-13 16:06:15.085 14 INFO nova.filters [req-84fed395-5a7a-4281-8704-a6cf57adc0ed 5cba6568454c40a2a9987876d900193b 3bb4842ccdf24eb08ac7edffc0e76d1d - default default] Filter TripleOCapabilitiesFilter returned 0 hosts Do you use scheduler hints / custom hostname mapping? Since the new compute will have a different index, have you updated the capabilities for the new index? In general I tested scale down/up with latest compose and it worked for me: (undercloud) [stack@undercloud-0 ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | 4e8f6021-6a1b-4df9-9f77-d12a5a927e88 | overcloud | f79c666f280c436f9530ff5eda9fcdba | CREATE_COMPLETE | 2021-06-30T05:35:46Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ (undercloud) [stack@undercloud-0 ~]$ source /home/stack/overcloudrc (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | 3d1601b1-1d75-48a5-adb2-f079ee6aa3c4 | nova-conductor | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:15:41.000000 | | 7c344811-7bce-4664-9a7a-8491793e05d5 | nova-scheduler | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:15:34.000000 | | 0d88bf40-161c-457e-a826-300605e376e3 | nova-compute | compute-0.redhat.local | nova | enabled | up | 2021-06-30T06:15:34.000000 | | fa5d6935-d229-4b65-b08a-55bf2a0a7f0c | nova-compute | compute-1.redhat.local | nova | enabled | up | 2021-06-30T06:15:34.000000 | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack compute service set compute-1.redhat.local nova-compute --disable (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +--------------------------------------+----------------+---------------------------+----------+----------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------+----------+----------+-------+----------------------------+ | 3d1601b1-1d75-48a5-adb2-f079ee6aa3c4 | nova-conductor | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:16:21.000000 | | 7c344811-7bce-4664-9a7a-8491793e05d5 | nova-scheduler | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:16:24.000000 | | 0d88bf40-161c-457e-a826-300605e376e3 | nova-compute | compute-0.redhat.local | nova | enabled | up | 2021-06-30T06:16:24.000000 | | fa5d6935-d229-4b65-b08a-55bf2a0a7f0c | nova-compute | compute-1.redhat.local | nova | disabled | up | 2021-06-30T06:16:24.000000 | +--------------------------------------+----------------+---------------------------+----------+----------+-------+----------------------------+ (undercloud) [stack@undercloud-0 ~]$ source /home/stack/stackrc (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | controller-0 | ACTIVE | ctlplane=192.168.24.51 | overcloud-full | controller | | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | compute-0 | ACTIVE | ctlplane=192.168.24.55 | overcloud-full | compute | | aac841b8-e6e7-4994-ae46-7c8b15398667 | compute-1 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | compute | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud -y compute-1 Deleting the following nodes from stack overcloud: - compute-1 Waiting for messages on queue 'tripleo' with no timeout. [stack@undercloud-0 ~]$ source overcloudrc (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | 3d1601b1-1d75-48a5-adb2-f079ee6aa3c4 | nova-conductor | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:22:11.000000 | | 7c344811-7bce-4664-9a7a-8491793e05d5 | nova-scheduler | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:22:04.000000 | | 0d88bf40-161c-457e-a826-300605e376e3 | nova-compute | compute-0.redhat.local | nova | enabled | up | 2021-06-30T06:22:04.000000 | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack resource provider list +--------------------------------------+------------------------+------------+ | uuid | name | generation | +--------------------------------------+------------------------+------------+ | ebfcda80-afa1-47c2-9a85-c5446d61a91e | compute-0.redhat.local | 2 | +--------------------------------------+------------------------+------------+ (overcloud) [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+-------------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+-------------------------------+ | 48e58d5d-fb97-4833-a36d-901790de08ed | OVN Controller agent | compute-0.redhat.local | | :-) | UP | ovn-controller | | 784ad1a2-4445-4ba5-a705-0cf57b199268 | OVN Metadata agent | compute-0.redhat.local | | :-) | UP | networking-ovn-metadata-agent | | f5c14eb2-d679-4857-9b24-438203bce587 | OVN Controller agent | compute-1.redhat.local | | :-) | UP | ovn-controller | | 532fce0a-5921-49a2-8227-b92b087001cb | OVN Metadata agent | compute-1.redhat.local | | :-) | UP | networking-ovn-metadata-agent | | 1fb383a5-41d3-4fb2-b46f-7ed85ec9bb5d | OVN Controller Gateway agent | controller-0.redhat.local | | :-) | UP | ovn-controller | +--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+-------------------------------+ - no need to manually remove nova-compute service - no need to manually remove the resource provider - ovn agents can not be deleted, which is a known issue and there are BZs for it (undercloud) [stack@undercloud-0 ~]$ source /home/stack/stackrc (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | controller-0 | ACTIVE | ctlplane=192.168.24.51 | overcloud-full | controller | | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | compute-0 | ACTIVE | ctlplane=192.168.24.55 | overcloud-full | compute | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | f8ede99a-fb92-4b70-8a9b-aba739cbc585 | compute-0 | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | power on | active | False | | 49d22e9e-0756-435d-bc53-273f8ef25692 | compute-1 | None | power off | available | False | | e7d006a3-3925-488c-8c44-8661c4ed271d | controller-0 | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ -> scale out running same initial deployment command with requesting 2 computes (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 69dd94a2-1cd8-4676-8ba1-a1b2c9cf78cb | compute-2 | BUILD | | overcloud-full | compute | | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | controller-0 | ACTIVE | ctlplane=192.168.24.51 | overcloud-full | controller | | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | compute-0 | ACTIVE | ctlplane=192.168.24.55 | overcloud-full | compute | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | f8ede99a-fb92-4b70-8a9b-aba739cbc585 | compute-0 | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | power on | active | False | | 49d22e9e-0756-435d-bc53-273f8ef25692 | compute-1 | 69dd94a2-1cd8-4676-8ba1-a1b2c9cf78cb | power off | deploying | False | | e7d006a3-3925-488c-8c44-8661c4ed271d | controller-0 | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ ... (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | f8ede99a-fb92-4b70-8a9b-aba739cbc585 | compute-0 | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | power on | active | False | | 49d22e9e-0756-435d-bc53-273f8ef25692 | compute-1 | 69dd94a2-1cd8-4676-8ba1-a1b2c9cf78cb | power on | active | False | | e7d006a3-3925-488c-8c44-8661c4ed271d | controller-0 | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 69dd94a2-1cd8-4676-8ba1-a1b2c9cf78cb | compute-2 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | compute | | 636f8bcb-404e-4c94-bef9-4f7171e862c7 | controller-0 | ACTIVE | ctlplane=192.168.24.51 | overcloud-full | controller | | 14cceadd-c550-4a0b-b565-f1a0ad2912a2 | compute-0 | ACTIVE | ctlplane=192.168.24.55 | overcloud-full | compute | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (overcloud) [stack@undercloud-0 ~]$ openstack compute service list +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ | 3d1601b1-1d75-48a5-adb2-f079ee6aa3c4 | nova-conductor | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:57:51.000000 | | 7c344811-7bce-4664-9a7a-8491793e05d5 | nova-scheduler | controller-0.redhat.local | internal | enabled | up | 2021-06-30T06:57:55.000000 | | 0d88bf40-161c-457e-a826-300605e376e3 | nova-compute | compute-0.redhat.local | nova | enabled | up | 2021-06-30T06:57:54.000000 | | 5ff861e8-9d6c-4f66-87e8-f64bb33117a5 | nova-compute | compute-2.redhat.local | nova | enabled | up | 2021-06-30T06:57:52.000000 | +--------------------------------------+----------------+---------------------------+----------+---------+-------+----------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack resource provider list +--------------------------------------+------------------------+------------+ | uuid | name | generation | +--------------------------------------+------------------------+------------+ | ebfcda80-afa1-47c2-9a85-c5446d61a91e | compute-0.redhat.local | 2 | | 3bc4773c-11b0-4e86-98d3-862fd5f13453 | compute-2.redhat.local | 2 | +--------------------------------------+------------------------+------------+
Hi, We do use a custom hostname mapping: ``` ControllerHostnameFormat: 'controller-%index%' ControllerSchedulerHints: 'capabilities:node': 'controller-%index%' ComputeOvsDpdkSriovHostnameFormat: 'computeovsdpdksriov-%index%' ComputeOvsDpdkSriovSchedulerHints: 'capabilities:node': 'computeovsdpdksriov-%index%' ``` To ease the debug I will attach a tar.gz of my templates. BR, Ella Shulman
(In reply to Ella Shulman from comment #22) > Hi, We do use a custom hostname mapping: > ``` > ControllerHostnameFormat: 'controller-%index%' > ControllerSchedulerHints: > 'capabilities:node': 'controller-%index%' > ComputeOvsDpdkSriovHostnameFormat: 'computeovsdpdksriov-%index%' > ComputeOvsDpdkSriovSchedulerHints: > 'capabilities:node': 'computeovsdpdksriov-%index%' > ``` > > To ease the debug I will attach a tar.gz of my templates. > > BR, > Ella Shulman With the ComputeOvsDpdkSriovSchedulerHints set you need to update the capabilities of the baremetal node to match the next index you are going to create. if you have 2 computes which right now have the capabilities node:computeovsdpdksriov-0/1 and you remove one and scale up again you need to have a node with the capabilities for the new index 2, e.g. openstack baremetal node set compute-1 --property capabilities='node:computeovsdpdksriov-2,profile:compute,boot_option:local'
Hi, I tried running what you suggested. it seems like placement is ok when increasing the index but for some reason, deployment got stuck in the middle I need to check whether it's just a single run issue or a persistent issue.
Since steps in comment24 helped for this issue, I'll close this BZ