Description of problem: When deploying of overcloud using CLI on 1 controller and 2 computes (3 ironic nodes available, tagged 2x compute, 1x control) the deployment scripts ends after booting nodes from PXE with errors: […] 2016-12-21 13:50:02Z [overcloud.Compute.0.NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" […] 2016-12-21 13:50:38Z [overcloud.Compute.1.NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" […] 2016-12-21 13:51:21Z [overcloud.Controller.0.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" […] 2016-12-21 13:56:07Z [overcloud.Compute.1]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-12-21 13:56:08Z [overcloud.Compute.1]: CREATE_FAILED ResourceInError: resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-12-21 13:56:08Z [overcloud.Compute.0]: CREATE_FAILED CREATE aborted 2016-12-21 13:56:08Z [overcloud.Compute]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-12-21 13:56:09Z [overcloud.Compute]: CREATE_FAILED ResourceInError: resources.Compute.resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-12-21 13:56:09Z [overcloud.Controller]: CREATE_FAILED CREATE aborted 2016-12-21 13:56:09Z [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Compute.resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2016-12-21 13:56:10Z [overcloud.Controller.0]: CREATE_FAILED CREATE aborted 2016-12-21 13:56:10Z [overcloud.Controller]: CREATE_FAILED Resource CREATE failed: Operation cancelled Stack overcloud CREATE_FAILED Heat Stack create failed. Version-Release number of selected component (if applicable): [stack@director-vm ~]$ openstack --version openstack 3.2.0 [stack@director-vm ~]$ sudo yum info python-tripleoclient Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Installed Packages Name : python-tripleoclient Arch : noarch Version : 5.4.0 Release : 2.el7ost Size : 901 k Repo : installed From repo : rhel-7-server-openstack-10-rpms Summary : OpenstackClient plugin for tripleoclient URL : https://pypi.python.org/pypi/python-tripleoclient License : ASL 2.0 Description : python-tripleoclient is a Python plugin to OpenstackClient : for TripleO <https://github.com/openstack/python-tripleoclient>. How reproducible:100% Steps to Reproduce: 1.Deploy Undercloud 2.Introspect nodes 3.Deploy Overcloud (see history_cli for detailed worflow) Actual results: Deplyoment fails as described above Expected results: A successful overcloud deployment Additional info: baremetal_nodes # node show info flavors # flavors details history_cli # history how it was deployed hyp_stat # nova hypervisor-stats instackenv.json # nodes configuration file nova.conf undercloud.conf logs: nova-api.log nova-cert.log nova-compute.log nova-conductor.log nova-manage.log nova-scheduler.log
Created attachment 1234458 [details] Logs information
Kernel version: 3.10.0-514.2.2.el7.x86_64
Created attachment 1234459 [details] Compute node PXE boot screen shoot
Error when undercloud deployment 2017-01-06 10:46:04 - Error: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies: 2017-01-06 10:46:04 - firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64 2017-01-06 10:46:04 - firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch 2017-01-06 10:46:04 - Error: /Stage[main]/Main/Package[firewalld]/ensure: change from 0.4.3.2-8.el7 to absent failed: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies: 2017-01-06 10:46:04 - firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64 2017-01-06 10:46:04 - firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch
---------------------------------------------- Problem solved. I found the root cause. Error was generated by CAPITAL LETTERS in the root disk serial number property. In our case the introspection returns following information about the disk drive: NODE: 92244c85-2e04-47ef-a7ed-8f153249bcea [ { "size": 480103981056, "rotational": false, "vendor": "ATA", ["name": "/dev/sda", "wwn_vendor_extension": null, "wwn_with_extension": "0x55cd2e404c70078b", "model": "INTEL SSDSC2BB48", "wwn": "0x55cd2e404c70078b", "serial": "PHWA60620327480FGN" } ] The "serial" parameter has value "PHWA60620327480FGN". Unfortunately providing this value to the Undercloud database by using command: openstack baremetal node set --property root_device='{"serial": "PHWA60620327480FGN"}' 92244c85-2e04-47ef-a7ed-8f153249bcea will finally generate Overcloud deployment error: [...] CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" [...] to fix the problem you have to provide serial number using only small letters: openstack baremetal node set --property root_device='{"serial": "phwa60620327480fgn"}' 92244c85-2e04-47ef-a7ed-8f153249bcea in that case deployment of the Overcloud will be successfully finished. @RedHat engineers - please fix this problem or provide suitable information in the OSP10 deployment guide. ---------------------------------------------------
Root cause can be found in this Bugzilla ID: 1398288
I also see this issue. Can we get an updat on this bugzilla please. ###ironic conductor.log 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor [req-7a5d33c0-50f7-442f-bea4-8c1995eebea1 - - - - -] Asynchronous exception for node 34caa2a8-3018-415c-9801-09ca3424b0ed: Node failed to get image for deploy. Exception: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"} 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last): 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 482, in heartbeat 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor self.continue_deploy(task) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 138, in wrapper 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor return f(*args, **kwargs) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 381, in continue_deploy 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor uuid_dict_returned = do_agent_iscsi_deploy(task, self._client) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 308, in do_agent_iscsi_deploy 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor raise exception.InstanceDeployFailure(reason=msg) 2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"} [stack@rhel73-osp10-dir swift-data]$ for node in $(ironic node-list | awk '!/UUID/ {print $2}'); do echo "NODE: $node" ; cat inspector_data-$node | jq '.inventory.disks' ; echo "-----" ; done NODE: 34caa2a8-3018-415c-9801-09ca3424b0ed [ { "size": 250059350016, "rotational": false, "vendor": "ATA", "name": "/dev/sda", "wwn_vendor_extension": null, "wwn_with_extension": "0x50025388a04e0300", "model": "Samsung SSD 840", "wwn": "0x50025388a04e0300", "serial": "S1DBNSAF640513M" }, { "size": 2000398934016, "rotational": true, "vendor": "ATA", "name": "/dev/sdb", "wwn_vendor_extension": null, "wwn_with_extension": "0x5000cca22de04702", "model": "HGST HUS724020AL", "wwn": "0x5000cca22de04702", "serial": "PK2134P6J905GX" }, { "size": 2000398934016, "rotational": true, "vendor": "ATA", "name": "/dev/sdc", "wwn_vendor_extension": null, "wwn_with_extension": "0x5000cca22de008a5", "model": "HGST HUS724020AL", "wwn": "0x5000cca22de008a5", "serial": "PK2134P6J8GKGX" } ]
Changing summary to reflect actual issue. Fixes for root device matching are planned for OSP-13.
*** Bug 1470405 has been marked as a duplicate of this bug. ***
This was never fixed upstream in rhos-10 but has been fixed from RHOS 11 onwards. A patch is being prepared to fix it in RHOS 10 as part of another bug. So I'll close this as a duplicate. *** This bug has been marked as a duplicate of bug 1452226 ***