Bug 1406856 - Ironic root_device hint using serial number does not match case
Summary: Ironic root_device hint using serial number does not match case
Keywords:
Status: CLOSED DUPLICATE of bug 1452226
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: Upstream M2
: 13.0 (Queens)
Assignee: Derek Higgins
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1470405 (view as bug list)
Depends On:
Blocks: 1335596 intelosp10bugs 1473267
TreeView+ depends on / blocked
 
Reported: 2016-12-21 16:30 UTC by chih-hsien.chien
Modified: 2020-08-13 08:46 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-27 10:44:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs information (911.53 KB, application/zip)
2016-12-21 16:37 UTC, chih-hsien.chien
no flags Details
Compute node PXE boot screen shoot (127.83 KB, image/png)
2016-12-21 16:46 UTC, chih-hsien.chien
no flags Details

Description chih-hsien.chien 2016-12-21 16:30:23 UTC
Description of problem:
When deploying of overcloud using CLI on 1 controller and 2 computes (3 ironic nodes available, tagged 2x compute, 1x control) the deployment scripts ends after booting nodes from PXE with errors:
[…]
2016-12-21 13:50:02Z [overcloud.Compute.0.NovaCompute]: CREATE_FAILED  ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
[…]
2016-12-21 13:50:38Z [overcloud.Compute.1.NovaCompute]: CREATE_FAILED  ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
[…]
2016-12-21 13:51:21Z [overcloud.Controller.0.Controller]: CREATE_FAILED  ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
[…]
2016-12-21 13:56:07Z [overcloud.Compute.1]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-21 13:56:08Z [overcloud.Compute.1]: CREATE_FAILED  ResourceInError: resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-21 13:56:08Z [overcloud.Compute.0]: CREATE_FAILED  CREATE aborted
2016-12-21 13:56:08Z [overcloud.Compute]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-21 13:56:09Z [overcloud.Compute]: CREATE_FAILED  ResourceInError: resources.Compute.resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-21 13:56:09Z [overcloud.Controller]: CREATE_FAILED  CREATE aborted
2016-12-21 13:56:09Z [overcloud]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.Compute.resources[1].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-21 13:56:10Z [overcloud.Controller.0]: CREATE_FAILED  CREATE aborted
2016-12-21 13:56:10Z [overcloud.Controller]: CREATE_FAILED  Resource CREATE failed: Operation cancelled

Stack overcloud CREATE_FAILED

Heat Stack create failed.


Version-Release number of selected component (if applicable):
[stack@director-vm ~]$ openstack --version
openstack 3.2.0
[stack@director-vm ~]$ sudo yum info python-tripleoclient
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
Name        : python-tripleoclient
Arch        : noarch
Version     : 5.4.0
Release     : 2.el7ost
Size        : 901 k
Repo        : installed
From repo   : rhel-7-server-openstack-10-rpms
Summary     : OpenstackClient plugin for tripleoclient
URL         : https://pypi.python.org/pypi/python-tripleoclient
License     : ASL 2.0
Description : python-tripleoclient is a Python plugin to OpenstackClient
            : for TripleO <https://github.com/openstack/python-tripleoclient>.


How reproducible:100%


Steps to Reproduce:
1.Deploy Undercloud
2.Introspect nodes
3.Deploy Overcloud  (see history_cli for detailed worflow)

Actual results: Deplyoment fails as described above

Expected results: A successful overcloud deployment


Additional info:
baremetal_nodes   	# node show info
flavors 		# flavors details
history_cli   		# history how it was deployed
hyp_stat   		# nova hypervisor-stats
instackenv.json 	# nodes configuration file
nova.conf  
undercloud.conf

logs:
nova-api.log   
nova-cert.log
nova-compute.log
nova-conductor.log
nova-manage.log
nova-scheduler.log

Comment 1 chih-hsien.chien 2016-12-21 16:37:57 UTC
Created attachment 1234458 [details]
Logs information

Comment 2 chih-hsien.chien 2016-12-21 16:44:16 UTC
Kernel version: 3.10.0-514.2.2.el7.x86_64

Comment 3 chih-hsien.chien 2016-12-21 16:46:24 UTC
Created attachment 1234459 [details]
Compute node PXE boot screen shoot

Comment 4 chih-hsien.chien 2017-01-06 17:56:19 UTC
Error when undercloud deployment
2017-01-06 10:46:04 - Error: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies:
2017-01-06 10:46:04 -     firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64
2017-01-06 10:46:04 -     firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch
2017-01-06 10:46:04 - Error: /Stage[main]/Main/Package[firewalld]/ensure: change from 0.4.3.2-8.el7 to absent failed: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies:
2017-01-06 10:46:04 -     firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64
2017-01-06 10:46:04 -     firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch

Comment 5 chih-hsien.chien 2017-01-13 13:12:44 UTC
----------------------------------------------
Problem solved. 

I found the root cause. Error was generated by CAPITAL LETTERS in the root disk serial number property. 

In our case the introspection returns following information about the disk drive:

NODE: 92244c85-2e04-47ef-a7ed-8f153249bcea
[
  {
    "size": 480103981056,
    "rotational": false,
    "vendor": "ATA",
    ["name": "/dev/sda",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x55cd2e404c70078b",
    "model": "INTEL SSDSC2BB48",
    "wwn": "0x55cd2e404c70078b",
    "serial": "PHWA60620327480FGN"
  }
]

The "serial" parameter has value "PHWA60620327480FGN". 
Unfortunately providing this value to the Undercloud database by using command:

openstack baremetal node set --property root_device='{"serial": "PHWA60620327480FGN"}' 92244c85-2e04-47ef-a7ed-8f153249bcea

will finally generate Overcloud deployment error: 

[...]
CREATE_FAILED  ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
[...]

to fix the problem you have to provide serial number using only small letters:

openstack baremetal node set --property root_device='{"serial": "phwa60620327480fgn"}' 92244c85-2e04-47ef-a7ed-8f153249bcea

in that case deployment of the Overcloud will be successfully finished.


@RedHat engineers - please fix this problem or provide suitable information in the OSP10 deployment guide.
---------------------------------------------------

Comment 6 chih-hsien.chien 2017-01-13 13:42:40 UTC
Root cause can be found in this Bugzilla ID: 1398288

Comment 7 Jeremy 2017-06-28 16:10:24 UTC
I also see this issue. Can we get an updat on this bugzilla please.


###ironic conductor.log
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor [req-7a5d33c0-50f7-442f-bea4-8c1995eebea1 - - - - -] Asynchronous exception for node 34caa2a8-3018-415c-9801-09ca3424b0ed: Node failed to get image for deploy. Exception: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"}
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last):
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 482, in heartbeat
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     self.continue_deploy(task)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     result = f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 138, in wrapper
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     return f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 381, in continue_deploy
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     uuid_dict_returned = do_agent_iscsi_deploy(task, self._client)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     result = f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor   File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 308, in do_agent_iscsi_deploy
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor     raise exception.InstanceDeployFailure(reason=msg)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"}



[stack@rhel73-osp10-dir swift-data]$ for node in $(ironic node-list | awk '!/UUID/ {print $2}'); do echo "NODE: $node" ; cat inspector_data-$node | jq '.inventory.disks' ; echo "-----" ; done
NODE: 34caa2a8-3018-415c-9801-09ca3424b0ed
[
  {
    "size": 250059350016,
    "rotational": false,
    "vendor": "ATA",
    "name": "/dev/sda",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x50025388a04e0300",
    "model": "Samsung SSD 840",
    "wwn": "0x50025388a04e0300",
    "serial": "S1DBNSAF640513M"
  },
  {
    "size": 2000398934016,
    "rotational": true,
    "vendor": "ATA",
    "name": "/dev/sdb",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x5000cca22de04702",
    "model": "HGST HUS724020AL",
    "wwn": "0x5000cca22de04702",
    "serial": "PK2134P6J905GX"
  },
  {
    "size": 2000398934016,
    "rotational": true,
    "vendor": "ATA",
    "name": "/dev/sdc",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x5000cca22de008a5",
    "model": "HGST HUS724020AL",
    "wwn": "0x5000cca22de008a5",
    "serial": "PK2134P6J8GKGX"
  }
]

Comment 8 Bob Fournier 2017-09-03 17:37:21 UTC
Changing summary to reflect actual issue.  Fixes for root device matching are planned for OSP-13.

Comment 9 Dmitry Tantsur 2017-10-02 12:05:27 UTC
*** Bug 1470405 has been marked as a duplicate of this bug. ***

Comment 13 Derek Higgins 2017-11-27 10:44:02 UTC
This was never fixed upstream in rhos-10 but has been fixed from RHOS 11 onwards. A patch is being prepared to fix it in RHOS 10 as part of another bug. So I'll close this as a duplicate.

*** This bug has been marked as a duplicate of bug 1452226 ***


Note You need to log in before you can comment on or make changes to this bug.