| Summary: | Specifying root device hints for VMs leads to introspection/instance boot failure | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sergii Nozhka <snozhka> | ||||||
| Component: | openstack-ironic-python-agent | Assignee: | Lucas Alvares Gomes <lmartins> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Raviv Bar-Tal <rbartal> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | unspecified | CC: | augol, chih-hsien.chien, dtantsur, ftaylor, jraju, lmartins, mburns, pbandark, rhel-osp-director-maint, rmahroua, sacpatil, sasha, slinaber, snozhka, srevivo | ||||||
| Target Milestone: | --- | Keywords: | Reopened, ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-04-27 10:59:56 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Please specify the exact failure. (In reply to Dmitry Tantsur from comment #1) > Please specify the exact failure. 2016-11-24 06:57:29.772 6882 ERROR ironic.drivers.modules.inspector [req-4527dbe4-6bc0-41ec-a5d7-1dd650be4280 - - - - -] Inspection failed for node bfe5f0ee-96d6-46c1-bf0c-2a6f5d7a628c with error: No disks satisfied root device hints Please refer http://pastebin.test.redhat.com/433352. Please attach openstack-ironic-inspector logs and ramdisk logs (see http://tripleo.org/troubleshooting/troubleshooting.html#accessing-logs-from-the-ramdisk) Created attachment 1223852 [details]
ramdisk logs
Created attachment 1223853 [details]
ironic-inspector log
We did some investigation upstream on this problem. What is happening is that, setting attributes such as "serial" or "wwn" in the libvirt XML doesn't work with some distros, apparently, such attributes doesn't get propagated to the host OS and Ironic can't read it to use with the root device hints (it uses udev to read it, same with lsblk). When used with a fedora 24 deploy ramdisk or on a real baremetal, the code in Ironic works. So, after some investigation. Apparently we need the virtio_scsi driver to be loaded (which is by default on centOS7 that I'm testing at the moment) and the disk controller should be "scsi" (instead of virtio or ide). Then the attributes from libvirt gets propagated to the OS and the deploy ramdisk can make use of them for the root device hints mechanism. There's another problem found while testing, apparently, specifying both WWN and Serial will cause libvirt to overwrite the serial number with the WWN, e.g: By setting: ... <serial>123456789</serial> <wwn>5001c60016ea71ad</wwn> ... The following appears on lsblk: [centos@testing ~]$ lsblk -okname,serial,wwn KNAME SERIAL WWN sda 5001c60016ea71ad 0x5001c60016ea71ad sda1 0x5001c60016ea71ad sda2 0x5001c60016ea71ad *** Bug 1404116 has been marked as a duplicate of this bug. *** I tried reproducing the issue on RHOS10 setup.
From /var/log/ironic/deploy/278c47dd-dbff-4fa4-a8be-1f215faab11b_5228609d-b22c-419a-afdc-107e93676537_2016-12-21-15:01:44 :
<snip>
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.625 579 DEBUG root [-] Root device hint serial=QM00005 does not match the device /dev/sda value of qm00005 match /usr/lib/python2.7/site-packages/ironic_python_agent/hardware.py:653
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 DEBUG root [-] Root device hint serial=QM00005 does not match the device /dev/sdb value of qm00007 match /usr/lib/python2.7/site-packages/ironic_python_agent/hardware.py:653
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x3848190>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'QM00005'}
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root Traceback (most recent call last):
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root File "/usr/lib/python2.7/site-packages/ironic_python_agent/hardware.py", line 1062, in dispatch_to_managers
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root return getattr(manager, method)(*args, **kwargs)
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root File "/usr/lib/python2.7/site-packages/ironic_python_agent/hardware.py", line 688, in get_os_install_device
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root "deployment using these hints %s" % root_device_hints)
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.626 579 ERROR root DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'serial': u'QM00005'}
</snip>
Here we can see:
<snip>
Dec 21 10:01:43 host-192-0-2-15 ironic-python-agent[579]: 2016-12-21 10:01:43.625 579 DEBUG root [-] Root device hint serial=QM00005 does not match the device /dev/sda value of qm00005 match /usr/lib/python2.7/site-packages/ironic_python_agent/hardware.py:653 <======
</snip>
Serial device which I specified was "QM00005". But its expecting "qm00005"
After going through the code, we(sachin,jaison,me) identified that serial value is getting updated with lower case:
647 if hint_value != current_value:
648 LOG.debug("Root device hint %(hint)s=%(value)s does not "
649 "match the device %(device)s value of "
650 "%(current)s", {
651 'hint': hint,
652 'value': hint_value, 'device': device,
653 'current': current_value})
654 return False
655 return True
[...]
668 if isinstance(value, six.string_types):
669 value = utils.normalize(value)
670
__
def normalize(string):
"""Return a normalized string.
Take a urlencoded value from Ironic and urldecode it.
:param string: a urlencoded string
:returns: a normalized version of passed in string
"""
return parse.unquote(string).lower().strip()
________________________
- As a fix we made below changes in ironic-python-agent.initramfs:
<snip>
647 if hint_value.lower() != current_value:
648 LOG.debug("Root device hint %(hint)s=%(value)s does not "
649 "match the device %(device)s value of "
650 "%(current)s", {
651 'hint': hint,
652 'value': hint_value, 'device': device,
653 'current': current_value})
654 return False
655 return True
[...]
</snip>
- rebuild the initramfs file and uploaded it glance. Currently I am redeploying the overcloud setup. I will update the bz with the result by tomorrow.
The deployment was successful:
[root@overcloud-compute-1 heat-admin]# fdisk -l /dev/sd
sda sda1 sda2 sdb sdb1 sdb2
[root@overcloud-compute-1 heat-admin]# fdisk -l /dev/sda
Disk /dev/sda: 44.0 GB, 44023414784 bytes, 85983232 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000ed482
Device Boot Start End Blocks Id System
/dev/sda1 2048 4095 1024 83 Linux
/dev/sda2 * 4096 85979879 42987892 83 Linux
[root@overcloud-compute-1 heat-admin]# ceph -s
cluster 082ccc78-c7a4-11e6-a4c8-525400d086af
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=192.168.125.30:6789/0}
election epoch 3, quorum 0 overcloud-controller-0
osdmap e15: 2 osds: 2 up, 2 in
flags sortbitwise
pgmap v1317: 224 pgs, 6 pools, 1711 kB data, 106 objects
78528 kB used, 30621 MB / 30697 MB avail
224 active+clean
[root@overcloud-compute-1 heat-admin]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
30697M 30621M 78528k 0.25
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 15310M 0
metrics 1 1711k 0.01 15310M 102
images 2 0 0 15310M 0
backups 3 0 0 15310M 0
volumes 4 113 0 15310M 4
vms 5 0 0 15310M 0
____________________
With patch mentioned in comment #13, test case was failing. Hence, we can go with below patch:
diff --git a/ironic_python_agent/hardware.py b/ironic_python_agent/hardware.py
index aad4777125bd..54a8489c2471 100644
--- a/ironic_python_agent/hardware.py
+++ b/ironic_python_agent/hardware.py
@@ -643,6 +643,8 @@ class GenericHardwareManager(HardwareManager):
'Current value: "%(value)s"; and type: "%(type)s"',
{'value': hint_value, 'type': type(hint_value)})
return False
+ elif hint == 'serial':
+ hint_value = utils.normalize(hint_value)
if hint_value != current_value:
LOG.debug("Root device hint %(hint)s=%(value)s does not "
For patch and test case, refer: https://gist.githubusercontent.com/psachin/0410aed410881a09c95701f469e4d325/raw/b30177a9bb484617cb1202890222adc933782c47/ironic_python_agent.diff
Thanks to sachin++ for guidance.
added additional test case to handle special characters for "serial": https://gist.github.com/pbandark/a4c1c3c480718e627aac40b9e2e878ae I haven't test the disk, but here are my findings:
- VirtIO disks are not compatible with overcloud deployments when the VM has more than one disk.
- The serial number, when added along the scsi device type allows me to use the --property root_device='{"serial":"'serial'"}' $node_id feature, thus forcing the installation on the disk
- When size is added as a property (root_device ='{"size": ">=20"}'), this is not taken into account, and Heat installs the OS randomly (vdb in my case)
- Using --root-device=vda doesn't work either, Heat ends up using a random disk.
Upon close look of the data being retrieved, I noticed that my primary disk (vda) shows as vdd in the metadata.
This confirms Jaison Raju's finding regarding how Ironic handles the disk.
I'm going to try the fix suggested in comment 14.
tested the fix* (In reply to Razique Mahroua from comment #17) > I haven't test the disk, but here are my findings: > - VirtIO disks are not compatible with overcloud deployments when the VM > has more than one disk. > - The serial number, when added along the scsi device type allows me to > use the --property root_device='{"serial":"'serial'"}' $node_id feature, > thus forcing the installation on the disk > - When size is added as a property (root_device ='{"size": ">=20"}'), this > is not taken into account, and Heat installs the OS randomly (vdb in my case) > - Using --root-device=vda doesn't work either, Heat ends up using a random > disk. > > Upon close look of the data being retrieved, I noticed that my primary disk > (vda) shows as vdd in the metadata. > > This confirms Jaison Raju's finding regarding how Ironic handles the disk. > > I'm going to try the fix suggested in comment 14. Pratik, Chen, could you please move your case issues to a new bug? This bug is about VMs specifically, and I'd like to close it, because it seems to be about VM configuration, not bug in Ironic. Please provide your hardware vendor and model, and output of lsblk. Thanks! Dmitry, The bug is certainly in Ironic. See comment#18, Razique tested the fix, I don't understand why the bug was closed? > Please provide your hardware vendor and model, and output of lsblk. See comment#9 The initial bug was about libvirtd not exposting serial/wwn correctly, and introspection failing due to that. This is NOT your case. Please open a separate bug with the serial number case (in)sensitivity explanation. Please refrain from reopening this bug, unless you want libvirtd behaviour somehow fixed (which is not something our team can do). In this new bug, please do provide lsblk output for *your* case with the following command line: sudo lsblk -Pbdi -oKNAME,MODEL,SIZE,ROTE,TYPE Please also attach introspection data for the nodes. Ideally, please fetch deployment logs, they should end up in /var/log/ironic/deploy. Thanks! |
Description of problem: Adding serial root_device property to ironic node with several disks leads to introspection/instance boot failure. Version-Release number of selected component (if applicable): python-ironicclient-1.7.0-1.el7ost.noarch openstack-ironic-inspector-4.2.0-3.el7ost.noarch openstack-ironic-conductor-6.2.1-5.el7ost.noarch python-ironic-lib-2.1.1-2.el7ost.noarch python-ironic-inspector-client-1.9.0-1.el7ost.noarch puppet-ironic-9.4.1-1.el7ost.noarch openstack-ironic-common-6.2.1-5.el7ost.noarch openstack-ironic-api-6.2.1-5.el7ost.noarch How reproducible: Steps to reproduce: 1. Create VM with at least 2 disks and assign serial to both: virt-install --ram 8192 --vcpus 4 --os-variant rhel7 --disk path=/var/lib/libvirt/images/root-hints-disk1-1.qcow2,device=disk,bus=virtio,format=qcow2,serial=4000cca77fc4dba1 --disk path=/var/lib/libvirt/images/root-hints-disk1-2.qcow2,device=disk,bus=virtio,format=qcow2,serial=5001c60016ea71ad --import --noautoconsole --vnc --network network:data --network network:management --network network:external --name root-hints-1 2. Add created VM into ironic list 3. Set root_device property of added node to one of desired serial numbers: ironic node-update root-hints-1 add properties/root_device='{"serial": "5001c60016ea71ad"}' 4. Try to introspect node. Actual result: Introspection failed. Expected result: Introspection passed. Additional info: The ironic-inspector log contains the following: 'root_disk': { 'rotational': True, 'vendor': '0x1af4', 'name': '/dev/vdb', 'wwn_vendor_extension': None, 'wwn_with_extension': None, 'model': '', 'wwn': None, 'serial': None, 'size': 21474836480 }