Created attachment 1893165 [details] HostedEngine-console.log Description of problem: Some time ago, oVirt's automated testing [1] started failing, timing out on a VM's start. At some point we added a console log to a file of the VM's console, which revealed that it was booted, started the grub menu, selected an entry, then got stuck. Later we tried manually running linux/initrd commands from the console, and saw that some times they are very slow. Today I spent some time on this, pushed a patch [2] to add a progress indicator (using 'insmod progress') to grub, and see that it's indeed slow - often much slower than expected, around 100KiB/s - but not always - sometimes it gets faster, up to around 10MiB/s (which is, I guess, expected/normal). I then tried manually to eliminate regressions, by downgrading relevant components to older versions - qemu-kvm, grub, which didn't help, and then seabios-bin, which did. Thus opening the bug on this component. We are using CentOS Stream 8, and the current/broken version is seabios-bin-1.16.0-1.module_el8.7.0+1140+ff0772f9.noarch . With seabios-bin-1.15.0-1.module_el8.6.0+1087+b42c8331.noarch , in all my tests I got better throughput - minimum was ~ 400KiB/s, often 1-2MiB/s, sometimes more than 20MiB/s. [1] https://github.com/oVirt/ovirt-system-tests/ [2] https://github.com/oVirt/ovirt-system-tests/pull/203 Version-Release number of selected component (if applicable): seabios-bin-1.16.0-1.module_el8.7.0+1140+ff0772f9.noarch How reproducible: Often, but not sure - not always Steps to Reproduce: 1. Boot a VM with a large (the default) kernel+initrd 2. 3. Actual results: Sometimes it's very slow Expected results: Should always be as fast as the platform allows Additional info: Attaching a console log which is a result of [2] above. You can view it comfortably with: $ cat HostedEngine-console.log | tr '\r' '\n' | less -R +/'\[ vml|\[ ini'
What exactly is the boot device? Can you capture a firmware log? <qemu:commandline> <qemu:arg value='-chardev'/> <qemu:arg value='file,id=firmware,path=/tmp/edk2-devel-q35.log'/> <qemu:arg value='-device'/> <qemu:arg value='isa-debugcon,iobase=0x402,chardev=firmware'/> </qemu:commandline>
(In reply to Gerd Hoffmann from comment #1) > What exactly is the boot device? Attaching the libvirt xml (as copied from a typical successful run of our CI, should be mostly identical to current case). > Can you capture a firmware log? > > <qemu:commandline> > <qemu:arg value='-chardev'/> > <qemu:arg value='file,id=firmware,path=/tmp/edk2-devel-q35.log'/> > <qemu:arg value='-device'/> > <qemu:arg value='isa-debugcon,iobase=0x402,chardev=firmware'/> > </qemu:commandline> I'll try.
three changes for virtio-blk in 1.15 ... 1.16: a05af290bac5 virtio-blk: split large IO according to size_max 815d7498655b virtio-blk: abstract a function named virtio_blk_op_one_segment to handle r/w request 27b573d4f5a5 virtio-blk: add feature VIRTIO_BLK_F_SIZE_MAX and VIRTIO_BLK_F_SEG_MAX They make seabios query the host for the maximum request size and if io requests are larger than that seabios will split them into smaller ones. Should essentially be a no-op, but maybe there is a bug in there that seabios splits requests even though it is not needed, thereby slowing down the boot. The firmware log (grep for lines with virtio-blk) should help figuring this.
Attached some console and firmware logs. Files named *-03 are from a plain unattended boot. Files named *-04* are from a run where I entered the grub prompt and ran several 'initrd' commands to load the default initramfs image (~ 67 MB). During some of them, I imposed cpu load on the host (plain 'while true; do :; done &'). Tried running more than one endless loop in parallel but this caused no difference - apparently we (vdsm/libvirt, no idea) somehow allocate a cpu to the qemu process for itself (the host (actually a nested-kvm VM) has 2 cpus). While cpu was loaded, throughput was ~ 100KB/s. When not, it was faster, but changing significantly. Files named *-05* are from a similar flow as 04, but after downgrading seabios-bin to 1.15. Throughput during cpu load is ~ 500 KB/s, without cpu load it's more than 1MB/s (in previous cases it also went much higher, not sure what changed. Perhaps the various tests cause memory fragmentation or whatever that makes it rather slow even unloaded, just a wild guess).
(In reply to Yedidyah Bar David from comment #8) > Attached some console and firmware logs. Seems to be corrupted or incomplete. I get a decompression error + only HostedEngine-console.log-02 unpacked, the other files are missing.
https://www.mail-archive.com/seabios@seabios.org/msg12940.html
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.
According to Comment 21 and Comment 25, set status to VERIFIED. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7472