2101787 – [rhel.8.7] Loading a kernel/initrd is sometimes very slow

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2101787 - [rhel.8.7] Loading a kernel/initrd is sometimes very slow

Summary: [rhel.8.7] Loading a kernel/initrd is sometimes very slow

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	seabios
Sub Component:
Version:	8.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Gerd Hoffmann
QA Contact:	Xueqiang Wei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2108555
TreeView+	depends on / blocked

Reported:	2022-06-28 11:14 UTC by Yedidyah Bar David
Modified:	2023-07-27 02:18 UTC (History)
CC List:	10 users (show)
Fixed In Version:	seabios-1.16.0-3.module+el8.7.0+16134+e5908aa2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2108555 (view as bug list)
Environment:
Last Closed:	2022-11-08 09:20:10 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
HostedEngine-console.log (195.91 KB, application/octet-stream) 2022-06-28 11:14 UTC, Yedidyah Bar David	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/rhel/src seabios merge_requests 8	None	None	None	2022-07-19 11:16:28 UTC
Red Hat Issue Tracker	RHELPLAN-126477	None	None	None	2022-06-28 11:37:22 UTC
Red Hat Product Errata	RHSA-2022:7472	None	None	None	2022-11-08 09:21:14 UTC

Description Yedidyah Bar David 2022-06-28 11:14:19 UTC

Created attachment 1893165 [details]
HostedEngine-console.log

Description of problem:

Some time ago, oVirt's automated testing [1] started failing, timing out on a VM's start. At some point we added a console log to a file of the VM's console, which revealed that it was booted, started the grub menu, selected an entry, then got stuck. Later we tried manually running linux/initrd commands from the console, and saw that some times they are very slow.

Today I spent some time on this, pushed a patch [2] to add a progress indicator (using 'insmod progress') to grub, and see that it's indeed slow - often much slower than expected, around 100KiB/s - but not always - sometimes it gets faster, up to around 10MiB/s (which is, I guess, expected/normal).

I then tried manually to eliminate regressions, by downgrading relevant components to older versions - qemu-kvm, grub, which didn't help, and then seabios-bin, which did. Thus opening the bug on this component.

We are using CentOS Stream 8, and the current/broken version is seabios-bin-1.16.0-1.module_el8.7.0+1140+ff0772f9.noarch .

With seabios-bin-1.15.0-1.module_el8.6.0+1087+b42c8331.noarch , in all my tests I got better throughput - minimum was ~ 400KiB/s, often 1-2MiB/s, sometimes more than 20MiB/s.

[1] https://github.com/oVirt/ovirt-system-tests/

[2] https://github.com/oVirt/ovirt-system-tests/pull/203

Version-Release number of selected component (if applicable):
seabios-bin-1.16.0-1.module_el8.7.0+1140+ff0772f9.noarch

How reproducible:
Often, but not sure - not always

Steps to Reproduce:
1. Boot a VM with a large (the default) kernel+initrd
2.
3.

Actual results:
Sometimes it's very slow

Expected results:
Should always be as fast as the platform allows

Additional info:

Attaching a console log which is a result of [2] above. You can view it comfortably with:

$ cat HostedEngine-console.log | tr '\r' '\n' | less -R +/'\[ vml|\[ ini'

Comment 1 Gerd Hoffmann 2022-06-29 09:04:21 UTC

What exactly is the boot device?
Can you capture a firmware log?

  <qemu:commandline>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='file,id=firmware,path=/tmp/edk2-devel-q35.log'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='isa-debugcon,iobase=0x402,chardev=firmware'/>
  </qemu:commandline>

Comment 2 Yedidyah Bar David 2022-06-29 10:04:27 UTC

(In reply to Gerd Hoffmann from comment #1)
> What exactly is the boot device?

Attaching the libvirt xml (as copied from a typical successful run of our CI, should be mostly identical to current case).

> Can you capture a firmware log?
> 
>   <qemu:commandline>
>     <qemu:arg value='-chardev'/>
>     <qemu:arg value='file,id=firmware,path=/tmp/edk2-devel-q35.log'/>
>     <qemu:arg value='-device'/>
>     <qemu:arg value='isa-debugcon,iobase=0x402,chardev=firmware'/>
>   </qemu:commandline>

I'll try.

Comment 5 Gerd Hoffmann 2022-06-29 10:25:14 UTC

three changes for virtio-blk in 1.15 ... 1.16:

a05af290bac5 virtio-blk: split large IO according to size_max
815d7498655b virtio-blk: abstract a function named virtio_blk_op_one_segment to handle r/w request
27b573d4f5a5 virtio-blk: add feature VIRTIO_BLK_F_SIZE_MAX and VIRTIO_BLK_F_SEG_MAX

They make seabios query the host for the maximum request size and if io requests
are larger than that seabios will split them into smaller ones. Should essentially
be a no-op, but maybe there is a bug in there that seabios splits requests even
though it is not needed, thereby slowing down the boot.

The firmware log (grep for lines with virtio-blk) should help figuring this.

Comment 8 Yedidyah Bar David 2022-06-30 06:12:18 UTC

Attached some console and firmware logs.

Files named *-03 are from a plain unattended boot.

Files named *-04* are from a run where I entered the grub prompt and ran several 'initrd' commands to load the default initramfs image (~ 67 MB). During some of them, I imposed cpu load on the host (plain 'while true; do :; done &'). Tried running more than one endless loop in parallel but this caused no difference - apparently we (vdsm/libvirt, no idea) somehow allocate a cpu to the qemu process for itself (the host (actually a nested-kvm VM) has 2 cpus). While cpu was loaded, throughput was ~ 100KB/s. When not, it was faster, but changing significantly.

Files named *-05* are from a similar flow as 04, but after downgrading seabios-bin to 1.15. Throughput during cpu load is ~ 500 KB/s, without cpu load it's more than 1MB/s (in previous cases it also went much higher, not sure what changed. Perhaps the various tests cause memory fragmentation or whatever that makes it rather slow even unloaded, just a wild guess).

Comment 9 Gerd Hoffmann 2022-06-30 10:59:03 UTC

(In reply to Yedidyah Bar David from comment #8)
> Attached some console and firmware logs.

Seems to be corrupted or incomplete.  I get a decompression error + only HostedEngine-console.log-02 unpacked, the other files are missing.

Comment 15 Gerd Hoffmann 2022-07-07 10:08:35 UTC

https://www.mail-archive.com/seabios@seabios.org/msg12940.html

Comment 20 Yanan Fu 2022-07-29 01:51:20 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 28 Xueqiang Wei 2022-08-09 08:09:28 UTC

According to Comment 21 and Comment 25, set status to VERIFIED. Thanks.

Comment 30 errata-xmlrpc 2022-11-08 09:20:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7472

Note You need to log in before you can comment on or make changes to this bug.