Bug 1406856
| Summary: | Ironic root_device hint using serial number does not match case | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | chih-hsien.chien | ||||||
| Component: | rhosp-director | Assignee: | Derek Higgins <derekh> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Omri Hochman <ohochman> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 10.0 (Newton) | CC: | aschultz, bfournie, chih-hsien.chien, dbecker, derekh, jdonohue, jmelvin, joea, krzysztofx.malkowski, mburns, mlammon, morazi, racedoro, rhel-osp-director-maint, robert.w.love | ||||||
| Target Milestone: | Upstream M2 | Keywords: | Triaged | ||||||
| Target Release: | 13.0 (Queens) | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-11-27 10:44:02 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1335596, 1409892, 1473267 | ||||||||
| Attachments: |
|
||||||||
|
Description
chih-hsien.chien
2016-12-21 16:30:23 UTC
Created attachment 1234458 [details]
Logs information
Kernel version: 3.10.0-514.2.2.el7.x86_64 Created attachment 1234459 [details]
Compute node PXE boot screen shoot
Error when undercloud deployment 2017-01-06 10:46:04 - Error: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies: 2017-01-06 10:46:04 - firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64 2017-01-06 10:46:04 - firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch 2017-01-06 10:46:04 - Error: /Stage[main]/Main/Package[firewalld]/ensure: change from 0.4.3.2-8.el7 to absent failed: Execution of '/bin/rpm -e firewalld-0.4.3.2-8.el7.noarch' returned 1: error: Failed dependencies: 2017-01-06 10:46:04 - firewalld >= 0.3.5-1 is needed by (installed) anaconda-core-21.48.22.93-1.el7.x86_64 2017-01-06 10:46:04 - firewalld = 0.4.3.2-8.el7 is needed by (installed) firewall-config-0.4.3.2-8.el7.noarch ----------------------------------------------
Problem solved.
I found the root cause. Error was generated by CAPITAL LETTERS in the root disk serial number property.
In our case the introspection returns following information about the disk drive:
NODE: 92244c85-2e04-47ef-a7ed-8f153249bcea
[
{
"size": 480103981056,
"rotational": false,
"vendor": "ATA",
["name": "/dev/sda",
"wwn_vendor_extension": null,
"wwn_with_extension": "0x55cd2e404c70078b",
"model": "INTEL SSDSC2BB48",
"wwn": "0x55cd2e404c70078b",
"serial": "PHWA60620327480FGN"
}
]
The "serial" parameter has value "PHWA60620327480FGN".
Unfortunately providing this value to the Undercloud database by using command:
openstack baremetal node set --property root_device='{"serial": "PHWA60620327480FGN"}' 92244c85-2e04-47ef-a7ed-8f153249bcea
will finally generate Overcloud deployment error:
[...]
CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
[...]
to fix the problem you have to provide serial number using only small letters:
openstack baremetal node set --property root_device='{"serial": "phwa60620327480fgn"}' 92244c85-2e04-47ef-a7ed-8f153249bcea
in that case deployment of the Overcloud will be successfully finished.
@RedHat engineers - please fix this problem or provide suitable information in the OSP10 deployment guide.
---------------------------------------------------
Root cause can be found in this Bugzilla ID: 1398288 I also see this issue. Can we get an updat on this bugzilla please.
###ironic conductor.log
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor [req-7a5d33c0-50f7-442f-bea4-8c1995eebea1 - - - - -] Asynchronous exception for node 34caa2a8-3018-415c-9801-09ca3424b0ed: Node failed to get image for deploy. Exception: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"}
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last):
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 482, in heartbeat
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor self.continue_deploy(task)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 138, in wrapper
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor return f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 381, in continue_deploy
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor uuid_dict_returned = do_agent_iscsi_deploy(task, self._client)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 308, in do_agent_iscsi_deploy
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor raise exception.InstanceDeployFailure(reason=msg)
2017-06-28 11:58:40.997 1490 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Failed to start the iSCSI target to deploy the node 34caa2a8-3018-415c-9801-09ca3424b0ed. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'serial': u'PK2134P6J905GX'}"}
[stack@rhel73-osp10-dir swift-data]$ for node in $(ironic node-list | awk '!/UUID/ {print $2}'); do echo "NODE: $node" ; cat inspector_data-$node | jq '.inventory.disks' ; echo "-----" ; done
NODE: 34caa2a8-3018-415c-9801-09ca3424b0ed
[
{
"size": 250059350016,
"rotational": false,
"vendor": "ATA",
"name": "/dev/sda",
"wwn_vendor_extension": null,
"wwn_with_extension": "0x50025388a04e0300",
"model": "Samsung SSD 840",
"wwn": "0x50025388a04e0300",
"serial": "S1DBNSAF640513M"
},
{
"size": 2000398934016,
"rotational": true,
"vendor": "ATA",
"name": "/dev/sdb",
"wwn_vendor_extension": null,
"wwn_with_extension": "0x5000cca22de04702",
"model": "HGST HUS724020AL",
"wwn": "0x5000cca22de04702",
"serial": "PK2134P6J905GX"
},
{
"size": 2000398934016,
"rotational": true,
"vendor": "ATA",
"name": "/dev/sdc",
"wwn_vendor_extension": null,
"wwn_with_extension": "0x5000cca22de008a5",
"model": "HGST HUS724020AL",
"wwn": "0x5000cca22de008a5",
"serial": "PK2134P6J8GKGX"
}
]
Changing summary to reflect actual issue. Fixes for root device matching are planned for OSP-13. *** Bug 1470405 has been marked as a duplicate of this bug. *** This was never fixed upstream in rhos-10 but has been fixed from RHOS 11 onwards. A patch is being prepared to fix it in RHOS 10 as part of another bug. So I'll close this as a duplicate. *** This bug has been marked as a duplicate of bug 1452226 *** |