Bug 1924972
Summary: | Guest whose os is installed multiple disks but boot partition is installed on single disk can't boot into OS on RHEL 8 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Xiaodai Wang <xiaodwan> | |
Component: | seabios | Assignee: | Gerd Hoffmann <kraxel> | |
Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 8.3 | CC: | ahadas, coli, hunter86_bg, jinzhao, juzhou, kkiwi, kraxel, lersek, lmen, lmiksik, meili, mxie, mzhan, phrdina, rjones, tgolembi, tyan, tzheng, virt-maint, xuwei, xuzhang, ymankad | |
Target Milestone: | rc | Keywords: | Automation, Regression, Reopened, Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | seabios-1.15.0-2.module+el8.6.0+14757+c25ee005 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1990808 2073012 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-10 13:18:39 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1990808, 2073012 | |||
Attachments: |
Description
Xiaodai Wang
2021-02-04 02:53:37 UTC
Hi Richard and Tomas, I can reproduce the bug when convert the guest to OSP16.2 and kvm managed by rhel8.4 libvirt, I think the bug will happen when target is rhel8 kvm hypervisor (can't reproduce the bug on rhv4.3 because rhv4.3 is based on rhel7 kvm hypervisor I think), so the bug needs to be fixed not only on rhv4.4 but also openstack and libvirt. I'm not sure the bug is caused by which component, just want to confirm if need to open new bug on openstack and libvirt, thanks! (In reply to mxie from comment #1) > Hi Richard and Tomas, > > I can reproduce the bug when convert the guest to OSP16.2 and kvm managed > by rhel8.4 libvirt, I think the bug will happen when target is rhel8 kvm > hypervisor (can't reproduce the bug on rhv4.3 because rhv4.3 is based on > rhel7 kvm hypervisor I think), so the bug needs to be fixed not only on > rhv4.4 but also openstack and libvirt. I'm not sure the bug is caused by > which component, just want to confirm if need to open new bug on openstack > and libvirt, thanks! I'm not sure I have any idea why this bug happens, but it's significant if it only happens when going to a RHEL 8 hypervisor versus RHEL 7. If you can reproduce this using a conversion direct to libvirt on a RHEL 8 host (without RHV or OpenStack involved), can you please post the full virt-v2v log of the failure case? And also the exact boot failure - preferably the full boot log captured from the virtual machine serial port. (In reply to Richard W.M. Jones from comment #2) > I'm not sure I have any idea why this bug happens, but it's significant > if it only happens when going to a RHEL 8 hypervisor versus RHEL 7. > > If you can reproduce this using a conversion direct to libvirt on a RHEL 8 > host (without RHV or OpenStack involved), can you please post the full > virt-v2v log of the failure case? And also the exact boot failure - > preferably > the full boot log captured from the virtual machine serial port. How to install linux guest to reproduce the bug: 1. Add four new disks for guest, then install rhel7 or rhel8 OS for guest 2. Manual partitioning disks during install OS, root and swap partition are installed on four disks(select sda, sdb, sdc and sdd), but boot partition is installed on single disk, such as just select sdd to install boot. pls refer to screenshot 'how-to-partition-disks.png' 3.Then finish OS installation for guest Package versions: virt-v2v-1.42.0-9.module+el8.4.0+9561+069bb9c1.x86_64 libguestfs-1.44.0-1.module+el8.4.0+9398+f376ac33.x86_64 libvirt-libs-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.x86_64 nbdkit-1.24.0-1.module+el8.4.0+9341+96cf2672.x86_64 Steps to reproduce the bug: 1.Use v2v to convert the guest whose os is installed multiple disks but boot partition is installed one of disks from VMware to kvm managed by libvirt #virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0 -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk -ip /home/passwd [ 0.0] Opening the source -i libvirt -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk -it vddk -io vddk-libdir=/home/vddk7.0 -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 [ 4.7] Creating an overlay to protect the source from being modified [ 8.1] Opening the overlay [ 14.7] Inspecting the overlay [ 26.2] Checking for sufficient free disk space in the guest [ 26.2] Estimating space required on target for each disk [ 26.2] Converting Red Hat Enterprise Linux 8.3 (Ootpa) to run on KVM virt-v2v: This guest has virtio drivers installed. [ 65.9] Mapping filesystem data to avoid copying unused and blank areas [ 66.9] Closing the overlay [ 67.2] Assigning disks to buses [ 67.2] Checking if the guest needs BIOS or UEFI to boot [ 67.2] Initializing the target -o libvirt -os default [ 67.2] Copying disk 1/4 to /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sda (raw) (100.00/100%) [ 188.3] Copying disk 2/4 to /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdb (raw) (100.00/100%) [ 332.0] Copying disk 3/4 to /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdc (raw) (100.00/100%) [ 500.9] Copying disk 4/4 to /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd (raw) (100.00/100%) [ 692.4] Creating output metadata [ 692.4] Finishing off 2. Power on guest on virt-manager, but guest can't log into os, pls refer to sceenshot"os-multiple-disks-boot-on-single-disk-on-virt-managre.png" Hi Richard, I don't know how to get the boot log via serial port of guest for the moment, please check the v2v debug log firstly Created attachment 1755895 [details]
v2v-os-install-multiple-disks-but-boot-partition-install-single-disk.log
Created attachment 1755896 [details]
os-multiple-disks-boot-on-single-disk-on-virt-manager.png
Created attachment 1755898 [details]
how-to-partition-disks.png
I don't understand this bug at all, but I can see from the log in comment 4 that we do not generate any <boot/> element in the libvirt XML at all, neither the "old" nor the "new" format. It's a bit surprising that this worked in the past - did something change in libvirt? There are a couple of things which would be useful to capture: (1) The libvirt XML of the guest after libvirt has loaded it, ie: virsh dumpxml Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk (2) The full qemu command line, probably in /var/log/libvirt/qemu/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk.log . (In reply to Richard W.M. Jones from comment #7) > I don't understand this bug at all, but I can see from the log in > comment 4 that we do not generate any <boot/> element in the libvirt > XML at all, neither the "old" nor the "new" format. It's a bit surprising > that this worked in the past - did something change in libvirt? > > There are a couple of things which would be useful to capture: > > (1) The libvirt XML of the guest after libvirt has loaded it, ie: > > virsh dumpxml > Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk > > (2) The full qemu command line, probably in > /var/log/libvirt/qemu/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on- > last-disk.log Please check log 'guest-libvirtxml-qemu-log-on-libvirt'. As the guest can boot into OS successfully on rhv4.3 but can't on rhv4.4, I also attached the libvirtxml of guest when running on rhv4.3 and rhv4.4 node for your reference Created attachment 1755920 [details]
guest-libvirtxml-qemu-log-on-libvirt
Created attachment 1755921 [details]
guest-libvirtxml-on-rhv4.4-node
Created attachment 1755922 [details]
guest-libvirtxml-on-rhv4.3-node
Libvirt adds <os> <boot dev='hd'/>. There are no other <boot> elements. There's a mysterious <source index> attribute. While it might not matter, it's odd because the indexes count down not up. This all gets translated into the following qemu options: -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sda","node-name":"libvirt-4-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-4-format","read-only":false,"driver":"raw","file":"libvirt-4-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=libvirt-4-format,id=virtio-disk0,bootindex=1 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdb","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-3-format","read-only":false,"driver":"raw","file":"libvirt-3-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=libvirt-3-format,id=virtio-disk1 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdc","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=libvirt-2-format,id=virtio-disk2 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-1-format,id=virtio-disk3 \ All the disks are at least being added to the guest, including sdd, so I don't understand why grub cannot see the /boot UUID. Could you please try the following: # virt-filesystems -a /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd --all --long -h I notice that ,bootindex=1 is added to the first disk. However we know that grub itself is being loaded successfully, so I can't see how that can be a problem. The real problem is that the /boot fs UUID is missing (or grub cannot find it). (In reply to Richard W.M. Jones from comment #13) > I notice that ,bootindex=1 is added to the first disk. However we know that > grub itself is being loaded successfully, so I can't see how that can be a > problem. > The real problem is that the /boot fs UUID is missing (or grub cannot find > it). # virt-filesystems -a /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd --all --long -h libguestfs: error: internal_parse_mountable: internal_parse_mountable_stub: /dev/rhel_vm-198-173/root: No such file or directory I also notice that rhel8 libvirt will add 'index=number' for disks in guest libvirtxml after powering on guest, but rhel7 libvirt will not add this (In reply to mxie from comment #8) > (In reply to Richard W.M. Jones from comment #7) > > I don't understand this bug at all, but I can see from the log in > > comment 4 that we do not generate any <boot/> element in the libvirt > > XML at all, neither the "old" nor the "new" format. It's a bit surprising > > that this worked in the past - did something change in libvirt? > > > Please check log 'guest-libvirtxml-qemu-log-on-libvirt'. As the guest can > boot into OS successfully on rhv4.3 but can't on rhv4.4, I also attached the > libvirtxml of guest when running on rhv4.3 and rhv4.4 node for your reference As far as I can tell there was no change in how the boot sequence is generated by RHV between 4.3 and 4.4 so I'd also suspect something has changed in libvirt (In reply to mxie from comment #14) > (In reply to Richard W.M. Jones from comment #13) > > I notice that ,bootindex=1 is added to the first disk. However we know that > > grub itself is being loaded successfully, so I can't see how that can be a > > problem. > > The real problem is that the /boot fs UUID is missing (or grub cannot find > > it). > > # virt-filesystems -a > /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition- > on-last-disk-sdd --all --long -h > libguestfs: error: internal_parse_mountable: internal_parse_mountable_stub: > /dev/rhel_vm-198-173/root: No such file or directory Sorry, the command was wrong because I forgot that the LVs span multiple disks. Here's a different command to try: # guestfish --ro -a /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd ><fs> run ><fs> vfs-uuid /dev/sda1 ><fs> findfs-uuid 5b5580e6-f442-4768-b52d-c2991c973787 ><fs> exit (In reply to Richard W.M. Jones from comment #16) > (In reply to mxie from comment #14) > > (In reply to Richard W.M. Jones from comment #13) > > > I notice that ,bootindex=1 is added to the first disk. However we know that > > > grub itself is being loaded successfully, so I can't see how that can be a > > > problem. > > > The real problem is that the /boot fs UUID is missing (or grub cannot find > > > it). > > > > # virt-filesystems -a > > /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition- > > on-last-disk-sdd --all --long -h > > libguestfs: error: internal_parse_mountable: internal_parse_mountable_stub: > > /dev/rhel_vm-198-173/root: No such file or directory > > Sorry, the command was wrong because I forgot that the LVs > span multiple disks. Here's a different command to try: > > # guestfish --ro -a > /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition- > on-last-disk-sdd > ><fs> run > ><fs> vfs-uuid /dev/sda1 > ><fs> findfs-uuid 5b5580e6-f442-4768-b52d-c2991c973787 > ><fs> exit # guestfish --ro -a /var/lib/libvirt/images/Auto-esx7.0-rhel8.3-os-multiple-disk-boot-partition-on-last-disk-sdd Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: ‘help’ for help on commands ‘man’ to read the manual ‘quit’ to quit the shell ><fs> run ><fs> vfs-uuid /dev/sda1 5b5580e6-f442-4768-b52d-c2991c973787 ><fs> findfs-uuid 5b5580e6-f442-4768-b52d-c2991c973787 /dev/sda1 Here is the reproducer not involving virt-v2v. I tried that on CentOS 8 and not on RHEL, but that should not matter much. Attached is a kickstart file to install a VM with /boot on separate disk. The virt-install command used to build the VM is following: virt-install --debug \ --connect="qemu:///system" \ --machine pc-q35-rhel8.2.0 \ --network="network=default" \ --initrd-inject="boot-issue.ks" \ --extra-args="ks=file:/boot-issue.ks console=tty0 console=ttyS0,115200" \ --name="x-boot-issue" \ --disk="disk1.qcow2,size=4,bus=virtio,format=qcow2" \ --disk="disk2.qcow2,size=1,bus=virtio,format=qcow2" \ --ram="2048" \ --cpu host \ --vcpus="1" \ --check-cpu \ --accelerate \ --location="https://ftp.fi.muni.cz/pub/linux/centos/8/BaseOS/x86_64/os/" \ --nographics \ --rng /dev/urandom With the packages from base repo the VM works fine: libvirt-daemon-6.0.0-28.module_el8.3.0+555+a55c8938.x86_64 qemu-kvm-core-4.2.0-34.module_el8.3.0+613+9ec9f184.1.x86_64 When the packages are updated to th -AV packages the VM no longer boots: libvirt-daemon-6.6.0-7.3.el8.x86_64 qemu-kvm-core-5.1.0-14.el8.1.x86_64 The domain XML looks the same, the only difference seems to be: - <controller type='usb' index='0' model='qemu-xhci'> + <controller type='usb' index='0' model='qemu-xhci' ports='15'> ... which seems irrelevant. When looking at the generated qemu command line there is a difference in -cpu command that contains new attribute: -cpu ...,amd-stibp=on,... ... which is probably also irrelevant. but there is also a removed attribute for disk -device arguments which maybe be source of the problem: -device ...,scsi=off,... This "scis=off" is there with libvirt 6.0 but is missing in libvirt 6.6. Now when I remove <os><boot> element and replace with <device><disk><boot index> and add it to both disk then the change in qemu command line is that also the second disk -device now has "...,bootindex=NNN" and the VM becomes bootable again. I cannot tell whether this is libvirt issue or qemu issue but it does not have anything to do with oVirt or libguestfs. Xiaodai, do you want to keep this bug on RHV and clone it to libvirt or can we move the bug to libvirt? Created attachment 1756304 [details]
kickstart for creating a test VM
Created attachment 1756306 [details]
generated domain XML for test VM
Created attachment 1756307 [details]
qemu command from libvirt 6.0 log
Created attachment 1756308 [details]
qemu command from libvirt 6.6 log
Based on comment 18, moving to libvirt From all the logs it's clear that libvirt is starting QEMU in the same way. The "scsi=off" is not relevant as well, see BZ 1829550 for more details. I check what changed in QEMU and managed to figure out the exact commit that broke this configuration: commit 563b9d0d8d944d358921a774f82a0e60527e7823 Author: Gerd Hoffmann <kraxel> Date: Thu Jul 2 15:45:13 2020 +0200 seabios: update binaries This updated the seabios binaries to pre-1.14 master snapshot of seabios. There is another commit in QEMU that updated the seabios git submodule: commit de15df5ead400b7c3d0cf21c8164a7686dc81933 Author: Gerd Hoffmann <kraxel> Date: Thu Jul 2 15:28:54 2020 +0200 seabios: update submodule to pre-1.14 master snapshot The commit message contains a list of all changes but in my guess these are the changes affecting the boot: virtio: Do not init non-bootable devices virtio-scsi: skip initializing non-bootable devices nvme: skip initializing non-bootable devices That explains why marking all disks as bootable makes the VM boot successfully. Moving to QEMU. > virtio: Do not init non-bootable devices
> virtio-scsi: skip initializing non-bootable devices
> nvme: skip initializing non-bootable devices
Yes, that's it.
seabios now only initializes bootable disks. This saves memory, speeds up the boot, avoids cluttering the boot menu with non-bootable devices and generally makes seabios manage systems with lots of disks better because it doesn't run out of memory that quickly any more.
That effectively means any device without a bootindex attached (<boot order='...'> in libvirt) is not initialized in case qemu is started with the "-boot strict=on" option.
So there are two ways to deal with this:
* Either add a <boot order ...> entry to every disk which grub needs access to,
* Or start qemu with "-boot strict=off".
Not sure whenever libvirt has a config option for the latter. Pavel?
That makes sense and I guess be default we want to have that optimization in place. Unfortunately libvirt doesn't have any way for users to control the "-boot strict=off" option. We add that automatically if QEMU supports it, the reason is this BZ 888635. Technically we can add a config option to let users to control it but not sure if it's desired to start a VM with strict=off if the mentioned BZ would be triggered in that case. Simple solution for layered products would be to add the boot order as you've suggested. Rich, would that be acceptable solution for virt-v2v to figure out which disks needs to be set as bootable or set all of them as bootable? Tricky. We would probably prefer either a way to control the -boot strict=off option via libvirt, or else we'd just mark every disk as bootable in the XML. I'm also concerned about the implications for layered products, both in the general case and in the v2v case. I guess then when this change makes it to RHEL people will find that previously bootable guests will no longer boot. Should also note in this case that the bootloader is on the first disk, but the /boot partition is on the 4th disk. The bootloader works, but it cannot find the /boot filesystem by UUID. IMHO it's unexpected that the 4th disk would also have to be marked as bootable. Tested this case on RHEL.8.4.0,test result are as fllows: 1.Start VM without 'bootindex', VM can start successfuly. -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel840-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_data,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/data1.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_data,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_data \ -device scsi-hd,id=data,drive=drive_data,write-cache=on \ 2.Start with 'bootindex', VM can not start. -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel840-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_data,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/data1.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_data,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_data \ -device scsi-hd,id=data,drive=drive_data,write-cache=on,bootindex=1 \ Btw, I investigated some documents (https://qemu.weilnetz.de/doc/4.2/qemu-doc.html) and found that the default value of the 'strict' option in qemu is 'off'. (In reply to Richard W.M. Jones from comment #30) > Should also note in this case that the bootloader is on the first disk, but > the > /boot partition is on the 4th disk. The bootloader works, but it cannot find > the /boot filesystem by UUID. IMHO it's unexpected that the 4th disk would > also have to be marked as bootable. So, the big question is how to go forward with this one? A plain revert is highly unlikely to happen in upstream seabios. Adding a compile-time option to seabios should be possible. In any case this is tricky. The underlying problem is that seabios operates in a very memory-constrained environment, so there are limits on how many disks seabios is able to manage before it runs out of memory. And there isn't much we can do about it, some data structures simply have to stay in real mode address space (below 1M). So in any case we'll go trade one problem for another. We could revert to old behavior. Which would fix this regression. But it would also make seabios need more memory then. QE regularly had problems due to seabios running out of memory in configurations with many disks. I'm not sure how much of a real world problem (i.e. customers hit it) this actually is though. The best way to avoid those issues is to use ovmf instead of seabios, but I suspect for the v2v use case that isn't an option as converting guests from bios to uefi is quite tricky and might not work at all in case the guest is old enough and doesn't ship with uefi support. Virt-v2v doesn't try (and really can't) convert guests from BIOS to UEFI. We will probably work with libvirt to add a way to toggle the -boot strict flag, or else make all the disks bootable, but it will need some complicated changes to both libvirt and virt-v2v. > We will probably work with libvirt to add a way to toggle the
> -boot strict flag, or else make all the disks bootable, but it will need
> some complicated changes to both libvirt and virt-v2v.
So, what do do with this bug now? Reassign to v2v?
This isn't a bug in virt-v2v, as you can see from the reproducer in comment 18. However this change will certainly affect all layered products and virt-v2v, and require a bunch of work to fix throughout the stack. It would be nice if qemu/seabios had a way to turn this off while we work on fixing everything (apparently -boot strict=off will not do what we want). Using `-boot strict=off` will "fix" the issue because in that case SeaBIOS will initialize all disks. All of that doesn't change a fact that this is a regression for previously working configuration and will most likely affect everything using libvirt + QEMU where `-boot strict=on` is the default because of the BZ 888635 I already mentioned. If SeaBIOS will not revert the change or rewrite it to have the optimization disabled by default this will need fixing all layered products which doesn't sound like the right way. Which version will this bug be fixed in? Tested on rhel8.5, also hit this issue. The test results are the same with Comment 31. Versions: kernel-4.18.0-325.el8.x86_64 qemu-kvm-6.0.0-27.module+el8.5.0+12121+c40c8708 seabios-bin-1.14.0-1.module+el8.4.0+8855+a9e237a9 1. create two images for guest # qemu-img create -f qcow2 /home/kvm_autotest_root/images/rhel850-64-virtio-scsi-.qcow2 30G # qemu-img create -f qcow2 /home/kvm_autotest_root/images/data1.qcow2 20G 2. install rhel8.5 guest, manual partitioning disks during install OS, root and swap partition are installed on two disks(select sda and sdb), but boot partition is installed on single disk, such as just select sdb to install boot. qemu command lines: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 15360 \ -object memory-backend-ram,size=15360M,id=mem-machine_mem \ -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 \ -cpu 'Opteron_G5',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/monitor-qmpmonitor1-20210809-121558-wBNvFjXB \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/monitor-catch_monitor-20210809-121558-wBNvFjXB \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idUo2RS7 \ -chardev socket,id=chardev_serial0,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/serial-serial0-20210809-121558-wBNvFjXB \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20210809-121558-wBNvFjXB,path=/tmp/avocado_qrgu9qi7/seabios-20210809-121558-wBNvFjXB,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20210809-121558-wBNvFjXB,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi-.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_data,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/data1.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_data,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_data \ -device scsi-hd,id=data,drive=drive_data,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:50:b5:dd:b4:3e,id=idf1ZeRk,netdev=idbwJAZM,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idbwJAZM,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/linux/RHEL-8.5.0-20210730.n.0-x86_64-dvd1.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=d,strict=off \ -no-shutdown \ -enable-kvm \ -monitor stdio \ 3.Then finish OS installation for guest 4. Start the guest without 'bootindex' 5. Start the guest with 'bootindex' add bootindex=1 to image data1.qcow2, e.g. "-device scsi-hd,id=data,drive=drive_data,write-cache=on,bootindex=1 \" After step 3, guest installs successfully. After step 4, guest can start successfully. After step 5, guest can not start. Also hit it on rhel9, tracked by Bug 1990808 - Guest whose os is installed multiple disks but boot partition is installed on single disk can't boot into OS on RHEL 8 Hi Gerd, Could you please help check which version will this bug be fixed in? Thanks. And the stale date is Agu 11, if it is closed as won't fix, I will reopen it, do you agree? Many thanks. Tested with edk2-ovmf-20210527gite1999b264f1f-3.el8.noarch, also hit this issue. So I think it's not seabios issue, can move it to qemu? Versions: kernel-4.18.0-325.el8.x86_64 qemu-kvm-6.0.0-27.module+el8.5.0+12121+c40c8708 edk2-ovmf-20210527gite1999b264f1f-3.el8.noarch 1. create two images for guest # qemu-img create -f qcow2 /home/kvm_autotest_root/images/rhel850-64-virtio-scsi-.qcow2 30G # qemu-img create -f qcow2 /home/kvm_autotest_root/images/data1.qcow2 20G 2. install rhel8.5 guest, manual partitioning disks during install OS, root and swap partition are installed on two disks(select sda and sdb), but /boot/efi partition is installed on single disk, such as just select sdb to install boot. # cp /usr/share/OVMF/OVMF_VARS.fd /home/kvm_autotest_root/images/avocado-vt-vm1_rhel850-64-virtio-scsi.qcow2_VARS.fd qemu command lines: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \ -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel850-64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \ -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 15360 \ -object memory-backend-ram,size=15360M,id=mem-machine_mem \ -smp 12,maxcpus=12,cores=6,threads=1,dies=1,sockets=2 \ -cpu 'Opteron_G5',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/monitor-qmpmonitor1-20210809-121558-wBNvFjXB \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/monitor-catch_monitor-20210809-121558-wBNvFjXB \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idUo2RS7 \ -chardev socket,id=chardev_serial0,server=on,wait=off,path=/tmp/avocado_qrgu9qi7/serial-serial0-20210809-121558-wBNvFjXB \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20210809-121558-wBNvFjXB,path=/tmp/avocado_qrgu9qi7/seabios-20210809-121558-wBNvFjXB,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20210809-121558-wBNvFjXB,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi-.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -blockdev node-name=file_data,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/data1.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_data,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_data \ -device scsi-hd,id=data,drive=drive_data,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:50:b5:dd:b4:3e,id=idf1ZeRk,netdev=idbwJAZM,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idbwJAZM,vhost=on \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/linux/RHEL-8.5.0-20210810.n.0-x86_64-dvd1.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=d,strict=off \ -no-shutdown \ -enable-kvm \ -monitor stdio \ 3.Then finish OS installation for guest 4. Start the guest without 'bootindex' 5. Start the guest with 'bootindex' add bootindex=1 to image data1.qcow2, e.g. "-device scsi-hd,id=data,drive=drive_data,write-cache=on,bootindex=1 \" After step 3, guest installs successfully. After step 4, guest can start successfully. After step 5, guest can not start. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Another example of the stale bug process being totally inappropriate. Hi Xiaodai, Could you try it with bootindex? For Gerd's suggestion(change the testcase to use bootindex), if it's acceptable for you? Thanks. For the details, please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1990808#c11 and https://bugzilla.redhat.com/show_bug.cgi?id=1990808#c12. (In reply to Xueqiang Wei from comment #45) > Hi Xiaodai, > > Could you try it with bootindex? For Gerd's suggestion(change the testcase > to use bootindex), if it's acceptable for you? Thanks. > > For the details, please refer to > https://bugzilla.redhat.com/show_bug.cgi?id=1990808#c11 and > https://bugzilla.redhat.com/show_bug.cgi?id=1990808#c12. This is not the expected solution for v2v based on the comments in this bug. Let's wait and see what developers say. We haven't made any changes in virt-v2v to make all the disks bootable, nor has libvirt provided a way to toggle the -boot strict flag, so there's been no fix as far as I can tell. We need to revert this change in seabios and look again at the issue in RHEL 9. Bulk update: Move RHEL-AV bugs to RHEL8 (In reply to Xueqiang Wei from comment #42) > Tested with edk2-ovmf-20210527gite1999b264f1f-3.el8.noarch, also hit this > issue. So I think it's not seabios issue, can move it to qemu? No. > 5. Start the guest with 'bootindex' > add bootindex=1 to image data1.qcow2, e.g. "-device > scsi-hd,id=data,drive=drive_data,write-cache=on,bootindex=1 \" > After step 5, guest can not start. bootindex must point to the disk with the ESP. *** Bug 2041083 has been marked as a duplicate of this bug. *** Let's get rid of the stupid stale bug flag anyway. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44373666 http://brew-task-repos.usersys.redhat.com/repos/scratch/ghoffman/seabios/1.15.0/1.el8.bz1924972.3/ https://gitlab.com/redhat/rhel/src/seabios/-/merge_requests/5 QE(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. According to Comment 82 and Comment 83, set status to VERIFIED. Thanks Xiaodai. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1759 (I've landed here from <https://gitlab.com/redhat/rhel/src/seabios/-/merge_requests/6>.) (In reply to Xueqiang Wei from comment #42) > Tested with edk2-ovmf-20210527gite1999b264f1f-3.el8.noarch, also hit this > issue. edk2 implements the same optimization that SeaBIOS does. "Boot fast with 200+ virtio-scsi disks and 200+ NICs" and "initialize disks that are secretly needed for OS boot" are conflicting goals. QE kept reporing memory usage and initialization time problems regarding the former use case (init time is relevant for both SeaBIOS and edk2), so both edk2 and SeaBIOS have been optimized for that. Which means that now you need to tell the firmware about the exact set of devices you need for booting. You can still say "all of them" (by marking each device individually as bootable), it just pushes more work to a higher level in the virt stack. I think the concern is that it broke existing customer guests. The change should have only been done in a major RHEL version. (In reply to Laszlo Ersek from comment #89) > (I've landed here from > <https://gitlab.com/redhat/rhel/src/seabios/-/merge_requests/6>.) > > (In reply to Xueqiang Wei from comment #42) > > Tested with edk2-ovmf-20210527gite1999b264f1f-3.el8.noarch, also hit this > > issue. > > edk2 implements the same optimization that SeaBIOS does. > Mistakenly I got the impression from one of the comments here (or related BZ) that this concerns only SeaBIOS. Do we have a similar BZ also for ovmf? (In reply to Richard W.M. Jones from comment #90) > I think the concern is that it broke existing customer guests. The > change should have only been done in a major RHEL version. That's why this is RHEL-8 only, the optimization is reverted to unbreak existing guests. On RHEL-9 we'll go keep the optimization instead. > Mistakenly I got the impression from one of the comments here (or related > BZ) that this concerns only SeaBIOS. Do we have a similar BZ also for ovmf? I'm not aware of any bugzilla or any report of guests stop working. |