Bug 1716358
Summary: | RHOS 15's default machine type, "q35", doesn't support IDE buses, but config drives are attached to IDE | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Michele Baldessari <michele> | |
Component: | openstack-nova | Assignee: | Lee Yarwood <lyarwood> | |
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 15.0 (Stein) | CC: | amodi, cfontain, dasmith, egallen, eglynn, jhakimra, kchamart, lyarwood, mbooth, sbauza, sgordon, supadhya, vkhitrin, vromanso | |
Target Milestone: | rc | Keywords: | Patch, Triaged | |
Target Release: | 15.0 (Stein) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-19.0.2-0.20190616040418.acd2daa.el8ost | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1761862 1761863 (view as bug list) | Environment: | ||
Last Closed: | 2019-09-21 11:22:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1761862, 1761863, 1782659 |
Description
Michele Baldessari
2019-06-03 10:20:24 UTC
First some notes for myself (using upstream code for easier linking): When we start _get_guest_xml in the libvirt driver, the following disk_info is passed to us: disk_info={ 'disk_bus': 'virtio', 'cdrom_bus': 'ide', 'mapping': { 'root': { 'bus': 'virtio', 'type': 'disk', 'dev': 'vda', 'boot_index': '1'}, 'disk': { 'bus': 'virtio', 'type': 'disk', 'dev': 'vda', 'boot_index': '1'}, 'disk.config': { 'bus': 'ide', 'dev': 'hda', 'type': 'cdrom' } } } The 'ide' bits are the culprit here. The default cdrom_bus is 'ide', so the config drive's bus ends up being 'ide' as well. disk_info comes to us [1] by calling blockinfo.get_disk_info in spawn() [2]. That in turn calls get_disk_bus_for_device_type() for 'disk' and 'cdrom', with the default for 'cdrom' being 'ide' [3] (assuming the 'kvm' hypervisor, which we obviously are). Note that this can be overridden by the hw_cdrom_bus image property [4]. The 'mapping' element in disk_info is obtained by calling get_disk_mapping() in blockinfo, which just calls get_disk_bus_for_device_type() again [5]. So that's the problem. We have a default machine type downstream that doesn't accept IDE buses, but the default bus for config drive CDROMs is IDE, and currently the only way to change that is with an image property. As a quick aside, upstream uses just '<type>hvm</type>' as a machine type, so while they still have IDE CDROMs, they've never hit this problem. I would expect all CI tests involving config drives to fail. If that's not the case and we have a passing config drive test somewhere, I'd like to see it to understand what I've missed. Otherwise, I'm afraid I don't have a solution for now. We can't reasonably except all our clients to add the hw_cdrom_bus property to all their images as part of the OSP15 upgrade. I'll have a think, talk to the rest of the compute DFG, we'll try to come up with something. [1] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/driver.py#L3165 [2] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/driver.py#L3144 [3] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/blockinfo.py#L272 [4] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/blockinfo.py#L239 [5] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/blockinfo.py#L530 (In reply to Artom Lifshitz from comment #1) > So that's the problem. We have a default machine type downstream that > doesn't accept IDE buses, but the default bus for config drive CDROMs is > IDE, and currently the only way to change that is with an image property. Nice work Artom! I wonder if we could we extract _get_machine_type [6] somewhere and call that from within get_disk_bus_for_device_type [5] so we can decide if the bus should be ide or scsi? [6] https://github.com/openstack/nova/blob/7feadd492f19a37b015d4ce62893cf27a0716033/nova/virt/libvirt/driver.py#L4252 I've proposed a DNM patch to reproduce this upstream. If that confirms my theory, we can think about how to fix this. Confirmed upstream [1]. [1] http://logs.openstack.org/87/662887/1/check/tempest-full-py3/e2012b4/controller/logs/screen-n-cpu.txt.gz?level=ERROR#_Jun_04_01_38_40_007694 *** Bug 1716221 has been marked as a duplicate of this bug. *** (In reply to Artom Lifshitz from comment #1) > First some notes for myself (using upstream code for easier linking): > > When we start _get_guest_xml in the libvirt driver, the following disk_info > is passed to us: > > disk_info={ > 'disk_bus': 'virtio', > 'cdrom_bus': 'ide', > 'mapping': { > 'root': { > 'bus': 'virtio', > 'type': 'disk', > 'dev': 'vda', > 'boot_index': '1'}, > 'disk': { > 'bus': 'virtio', > 'type': 'disk', > 'dev': 'vda', > 'boot_index': '1'}, > 'disk.config': { > 'bus': 'ide', > 'dev': 'hda', > 'type': 'cdrom' > } > } > } > > The 'ide' bits are the culprit here. The default cdrom_bus is 'ide', so the > config drive's bus ends up being 'ide' as well. > > disk_info comes to us [1] by calling blockinfo.get_disk_info in spawn() [2]. > That in turn calls get_disk_bus_for_device_type() for 'disk' and 'cdrom', > with the default for 'cdrom' being 'ide' [3] (assuming the 'kvm' hypervisor, > which we obviously are). Note that this can be overridden by the > hw_cdrom_bus image property [4]. > > The 'mapping' element in disk_info is obtained by calling get_disk_mapping() > in blockinfo, which just calls get_disk_bus_for_device_type() again [5]. > > So that's the problem. We have a default machine type downstream that > doesn't accept IDE buses, but the default bus for config drive CDROMs is > IDE, and currently the only way to change that is with an image property. > > As a quick aside, upstream uses just '<type>hvm</type>' as a machine type, > so while they still have IDE CDROMs, they've never hit this problem. > > I would expect all CI tests involving config drives to fail. If that's not > the case and we have a passing config drive test somewhere, I'd like to see > it to understand what I've missed. Otherwise, I'm afraid I don't have a > solution for now. We can't reasonably except all our clients to add the > hw_cdrom_bus property to all their images as part of the OSP15 upgrade. I'll > have a think, talk to the rest of the compute DFG, we'll try to come up with > something. [Sorry for the previous empty comment; accidentally hit send too soon.] A few notes based on earlier comments from Artom and Lee. tl;dr: To address this, we need to make Nova usE "sata" disk bus for CD-ROM, when using Q35 machine type. Long ---- (*) The libvirtError here is expected (it's a no-op), because Q35 machine type does not support IDE — only SATA or SCSI (and QEMU's emulated SCSI is not recommended; however, 'virtio-scsi' is the most trustworthy, but it needs drivers. (*) Given the above, for Nova, when using Q35, we should change the default bus for CD-ROM to unconditionally use "sata". (I've double-checked it with the QEMU folks, too; they recommend the same Q35 has on-board SATA.) (*) I think you saw this in the upstream Nova guest XML: ... <os> <type>hvm</type> ... </os> ... And said "upstream uses just '<type>hvm</type>' as a machine type". Here, "hvm" [Hardware Virtual Machine] is not a machine type; it means "the Operating System is designed to run on bare metal, so requires full virtualization [using CPU hardware extensions]" (*) When you don't specify a machine type (as in the case of upstream CI), the default is whatever the QEMU binary on your system reports as such: `qemu-system-x86 -machine help | grep default`. So upstream Nova CI doesn't hit this problem because it uses QEMU's default machine type, which is "pc" (that has on-board IDE). *** Bug 1719938 has been marked as a duplicate of this bug. *** *** Bug 1719939 has been marked as a duplicate of this bug. *** [stack@undercloud-0 tempest]$ tempest run --regex test_server_basic_ops {0} tempest.scenario.test_server_basic_ops.TestServerBasicOps.test_server_basic_ops [59.730387s] ... ok ====== Totals ====== Ran: 1 tests in 59.7304 sec. - Passed: 1 - Skipped: 0 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 0 Sum of execute time for each test: 59.7304 sec. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |