Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: unable to scale out compute nodes post upgrade. Trying to deploy with an additional node fails with: 2017-11-10 10:30:35Z [overcloud]: UPDATE_FAILED resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500" Version-Release number of selected component (if applicable): 2017-11-09.2 build How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with 3 controllers, 2 computes, 3 ceph nodes 2. Upgrade to OSP12 3. Remove one compute node from deployment: openstack overcloud node delete --stack overcloud efd8563d-7619-40f9-ac4f-67cf7b6798a1 4. Wait for stack to get UPDATE_COMPLETE 5. Rerun the openstack overcloud deploy with ComputeCount: 2 to get the deleted compute node reprovisioned Actual results: The deploy command fails with: 2017-11-10 10:30:35Z [overcloud]: UPDATE_FAILED resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500" Stack overcloud UPDATE_FAILED overcloud.Compute.2.NovaCompute: resource_type: OS::TripleO::ComputeServer physical_resource_id: 492f864f-76bf-4acf-9f89-8148b4ed427b status: CREATE_FAILED status_reason: | ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. , Code: 500" Heat Stack update failed. Heat Stack update failed. Expected results: The deploy command gets completed fine. Additional info: Attaching the sosreport on the undercloud.
(undercloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | eadafa81-0ce3-48ef-9101-ae80e3509e71 | ceph-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | 8fad8238-c463-4807-992b-19a0bdfe840f | ceph-1 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | 88826ab3-fd49-4866-9f18-daa3be19bcd1 | ceph-2 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 2e145e34-c57e-4a75-a59b-1c19bd58f289 | compute-1 | ACTIVE | - | Running | ctlplane=192.168.24.9 | | 492f864f-76bf-4acf-9f89-8148b4ed427b | compute-2 | ERROR | - | NOSTATE | | | 61a4692f-8acc-418b-a3da-3e5294b58d37 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | b230be0b-1699-4078-995d-a6a1ca6e1cb3 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | ecfef989-f2b9-4f42-8f73-bbd3c2c3ce47 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.7 | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ Checking the nova logs for the failed node uuid we can see in /var/log/nova/nova-scheduler.log: 2017-11-10 05:30:02.529 1348 DEBUG nova.scheduler.manager [req-6cb9920f-7705-43c9-ad06-42be84e6bf9c a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Starting to schedule for instances: [u'492f864f-76bf-4acf-9f89-8148b4ed427b'] select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:113 2017-11-10 05:30:02.550 1348 DEBUG nova.scheduler.manager [req-6cb9920f-7705-43c9-ad06-42be84e6bf9c a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Got no allocation candidates from the Placement API. This may be a temporary occurrence as compute nodes start up and begin reporting inventory to the Placement service. select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:133 2017-11-10 05:30:33.083 1348 DEBUG oslo_concurrency.lockutils [req-d6621942-d42d-4826-bbbd-f3197a374167 - - - - -] Lock "host_instance" acquired by "nova.scheduler.host_manager.sync_instance_info" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270 In /var/log/nova/nova-conductor.log: 2017-11-10 05:29:02.934 3033 ERROR nova.conductor.manager [req-dd94e11d-a69b-4d29-8ab3-667325074865 a1f3cd9117df43c8ad2a236b6f70e801 d6b72ece1f95470b817ea14f96205691 - default default] Failed to schedule instances: NoValidHost_Remote: No v alid host was found. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 232, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 137, in select_destinations raise exception.NoValidHost(reason="") NoValidHost: No valid host was found.
Resource class was only set on one of the ironic nodes during upgrade /home/stack/undercloud_upgrade.log: 2017-11-09 06:42:05,991 INFO: [2017-11-09 06:42:05,991] (os-refresh-config) [INFO] Completed phase post-configure 2017-11-09 06:42:06,000 INFO: os-refresh-config completed successfully 2017-11-09 06:42:07,623 INFO: Node f99ca41a-9daf-4927-8458-e937de3c93e3 resource class was set to baremetal 2017-11-09 06:42:07,662 INFO: Not creating flavor "baremetal" because it already exists. 2017-11-09 06:42:07,758 INFO: Flavor baremetal updated to use custom resource class baremetal 2017-11-09 06:42:07,876 INFO: Created flavor "control" with profile "control" 2017-11-09 06:42:07,876 INFO: Not creating flavor "compute" because it already exists. 2017-11-09 06:42:07,950 INFO: Flavor compute updated to use custom resource class baremetal 2017-11-09 06:42:08,046 INFO: Created flavor "ceph-storage" with profile "ceph-storage" 2017-11-09 06:42:08,137 INFO: Created flavor "block-storage" with profile "block-storage" 2017-11-09 06:42:08,228 INFO: Created flavor "swift-storage" with profile "swift-storage" 2017-11-09 06:42:08,236 INFO: Configuring Mistral workbooks 2017-11-09 06:42:34,598 INFO: Mistral workbooks configured successfully 2017-11-09 06:42:35,099 INFO: Migrating environment for plan overcloud to Swift. 2017-11-09 06:42:35,212 INFO: Not creating default plan "overcloud" because it already exists. 2017-11-09 06:42:35,212 INFO: Configuring an hourly cron trigger for tripleo-ui logging 2017-11-09 06:42:37,703 INFO: Added _member_ role to admin user 2017-11-09 06:42:37,986 INFO: Starting and waiting for validation groups ['post-upgrade'] limit should be 0 here https://review.openstack.org/#/c/490851/9/instack_undercloud/undercloud.py@1414.
with limit==-1 [<Node {u'uuid': u'f99ca41a-9daf-4927-8458-e937de3c93e3', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'bookmark'}], u'resource_class': u'baremetal'}>] with limit==0 [<Node {u'uuid': u'f99ca41a-9daf-4927-8458-e937de3c93e3', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f99ca41a-9daf-4927-8458-e937de3c93e3', u'rel': u'bookmark'}], u'resource_class': u'baremetal'}>, <Node {u'uuid': u'4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/4ebf6ff1-3f3a-447f-b5c2-ec9c04ced8ce', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/a6c3c3fb-0ff2-46dc-a02b-6d6ffe9d74b2', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/f5dd8219-6b8f-4a39-8a96-6330689d54e2', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/046cb1f3-5d50-4be8-80c2-1d4ccc58487a', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'782bdc4f-af01-47c4-ac02-d73276d7ab77', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/782bdc4f-af01-47c4-ac02-d73276d7ab77', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/782bdc4f-af01-47c4-ac02-d73276d7ab77', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'c7c26891-88d1-498f-a84e-c15886ec3198', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/c7c26891-88d1-498f-a84e-c15886ec3198', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/c7c26891-88d1-498f-a84e-c15886ec3198', u'rel': u'bookmark'}], u'resource_class': None}>, <Node {u'uuid': u'81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'links': [{u'href': u'http://192.168.24.1:6385/v1/nodes/81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'rel': u'self'}, {u'href': u'http://192.168.24.1:6385/nodes/81f8dd71-e0c6-4be7-b20f-47871c61a2a9', u'rel': u'bookmark'}], u'resource_class': None}>]
Thanks for triaging, I can take care of it.
Correction: stable/pike patch is https://review.openstack.org/519312
Merged downstream - https://code.engineering.redhat.com/gerrit/#/c/123953/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462