Bug 2213346 - CONFIG_PRINTK_TIME causes occasional hangs when booting on qemu
Summary: CONFIG_PRINTK_TIME causes occasional hangs when booting on qemu
Keywords:
Status: POST
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://lkml.org/lkml/2023/6/13/733
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2023-06-07 21:30 UTC by Richard W.M. Jones
Modified: 2023-06-27 07:53 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
log file (16.85 KB, text/plain)
2023-06-07 21:33 UTC, Richard W.M. Jones
no flags Details
guestfs-m5oblnwjlg44snt8.log (qemu command) (4.23 KB, text/plain)
2023-06-07 21:35 UTC, Richard W.M. Jones
no flags Details
another hang log (17.10 KB, text/plain)
2023-06-08 14:52 UTC, Eric Blake
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab qemu-project qemu issues 1696 0 None opened Linux kernel hangs rarely when booting on the latest qemu 2023-06-08 15:38:04 UTC

Description Richard W.M. Jones 2023-06-07 21:30:58 UTC
Run this command:

$ while guestfish -a /dev/null -v run >& /tmp/log; do echo -n . ; done

Occasionally (less than 1 in 100 times) it will hang.  See the attached log, but the last lines of output are:

[    0.070120] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.070120] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
[    0.070120] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
[    0.070120] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.070120] Spectre V2 : Mitigation: Retpolines
[    0.070120] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.070120] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
[    0.070120] Spectre V2 : Enabling Speculation Barrier for firmware calls
[    0.070120] RETBleed: Mitigation: untrained return thunk
[    0.070120] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.070120] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[    0.070120] Freeing SMP alternatives memory: 48K

Normally we would expect to see this line next:

[    0.070794] smpboot: CPU0: AMD Ryzen 9 3900X 12-Core Processor (family: 0x17, model: 0x71, stepping: 0x0)


Reproducible: Sometimes

Steps to Reproduce:
1. See description above

Comment 1 Richard W.M. Jones 2023-06-07 21:32:34 UTC
Host kernel 6.4.0-0.rc2.23.fc39.x86_64
Guest kernel 6.4.0-0.rc5.41.fc39.x86_64
qemu-system-x86-8.0.0-4.fc39.x86_64
glibc-2.37.9000-10.fc39.x86_64

Comment 2 Richard W.M. Jones 2023-06-07 21:33:07 UTC
Created attachment 1969619 [details]
log file

Comment 3 Richard W.M. Jones 2023-06-07 21:35:01 UTC
Created attachment 1969620 [details]
guestfs-m5oblnwjlg44snt8.log (qemu command)

Note that "qemu-kvm: terminating on signal 15" in this log is from
me killing qemu after the hang, it's not related to the hang itself.

Comment 4 Richard W.M. Jones 2023-06-08 14:46:37 UTC
Still happens with kernel 6.4.0-0.rc5.41.fc39.x86_64 and
qemu-system-x86-8.0.0-4.fc39.x86_64 which are the latest in
Rawhide at time of writing.

Comment 5 Eric Blake 2023-06-08 14:52:33 UTC
Created attachment 1969789 [details]
another hang log

I reproduced the hang after about 280 iterations on Fedora 37 on Intel hardware:
host kernel-6.2.15-200.fc37.x86_64
qemu-system-x86-7.0.0-15.fc37.x86_64
glibc-2.36-9.fc37.x86_64
$ grep 'model name' /proc/cpuinfo | head -n1
model name	: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz

Comment 6 Richard W.M. Jones 2023-06-08 15:37:51 UTC
Upstream bug:
https://gitlab.com/qemu-project/qemu/-/issues/1696

Comment 7 Richard W.M. Jones 2023-06-13 14:30:56 UTC
It seems to be a kernel bug:
https://lkml.org/lkml/2023/6/13/733

Comment 9 Richard W.M. Jones 2023-06-22 14:02:13 UTC
Fixed in kernel commit 13bb06f8dd42071cb9a49f6e21099eea05d4b856

Comment 10 Dusty Mabe 2023-06-26 02:51:09 UTC
Any chance we could get the fix backported? This causes not only problems in our CI, but also in our build system (with uses guestfish quite a lot to manipulate disk images).

Comment 11 Richard W.M. Jones 2023-06-26 09:44:41 UTC
We absolutely need to fix this everywhere the original bug ended up (including
RHEL if its there) because this is a serious issue.

Comment 12 Richard W.M. Jones 2023-06-27 07:53:10 UTC
FWIW I've seen stable backports posted by Greg K-H to the following branches:

6.3
6.1
5.15
5.10
5.4

So all of those versions of Linux are potentially affected if they've been
following the upstream stable branch.


Note You need to log in before you can comment on or make changes to this bug.