Bug 2107827 - Hard freeze on kernel 5.18.11-200.fc36 when Libvirt/KVM/Qemu Windows 7 is started
Summary: Hard freeze on kernel 5.18.11-200.fc36 when Libvirt/KVM/Qemu Windows 7 is sta...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 36
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-16 12:48 UTC by Zdeněk Rampas
Modified: 2022-07-31 08:14 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-31 08:14:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl --no-hostname -k > dmesg_5.18.11.txt (91.48 KB, text/plain)
2022-07-16 12:48 UTC, Zdeněk Rampas
no flags Details
debug info from terminal (312.28 KB, image/jpeg)
2022-07-18 16:08 UTC, Zdeněk Rampas
no flags Details
debug info from terminal 2 (350.58 KB, image/jpeg)
2022-07-18 16:09 UTC, Zdeněk Rampas
no flags Details
debug info from terminal 3 (320.35 KB, image/jpeg)
2022-07-18 16:09 UTC, Zdeněk Rampas
no flags Details
Netconsole kernel panic after W7 is started (5.67 KB, text/plain)
2022-07-19 17:55 UTC, Zdeněk Rampas
no flags Details

Description Zdeněk Rampas 2022-07-16 12:48:55 UTC
Created attachment 1897658 [details]
journalctl --no-hostname -k > dmesg_5.18.11.txt

Hard freeze on kernel 5.18.11-200.fc36 when Libvirt/KVM/Qemu Windows 7 VM is started

1. Please describe the problem:

After upgrading to kernel 5.18.11-200.fc36, my LENOVO ThinkPad L490 20Q6S4JH01 laptop started to crash. What causes the crash I did not find out, journalctl is cleanly cut with no warnings or kernel errors. Also the local console is stuck, the cursor is not blinking, the ssh connection is frozen and ping to the computer is not responding.

I find that the hit occurs shortly after starting a virtual machine with Windows 7. The VM with Windows 10 works without issue. I tried disabling all mitigations and retbleed mitigation but it did not solve the problem. Also a clean install of the Windows 7 VM causes the crash (during installation).

I am inserting this bug in Fedora Bugzilla, because on my desktop with vanilla kernel 5.18.11 this problem did not show up (Debian 11)

2. What is the Version-Release number of the kernel:

5.18.11-200.fc36

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, on 5.18.10-200.fc36 Windows 7 VMs run without problem

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
   
Boot 5.18.11-200.fc36 kernel

1. Install libvirt, virt-manager, qemu, qemu-kvm...
2. Run virt-manager
3. Create a new VM
4. Select local install media (ISO image or cd-rom)
5. Insert the ISO (in my case cs_windows_7_professional_x64_dvd_x15-65799.iso)
6. Microsoft Windows 7 should be selected as the template. Forward
7. Forward (default CPUs and RAM)
8. Default storage. Forward
9. Finish (no customization)

The VM is started and within moments the computer crashes. SSH connection does not respond, ping does not respond, cursor in local terminal freezes.

Already installed Windows 7 VM also freezes the computer after startup 

This procedure works fine on kernel 5.18.10-200.fc36

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
   
5.19.0-0.rc6.20220714git4a57a8400075.49.fc37 is the same way as 5.18.11-200.fc36


6. Are you running any modules that not shipped with directly Fedora's kernel?:

dmesg | grep -i tainted
is empty

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

dmesg_5.18.11

Comment 1 Zdeněk Rampas 2022-07-18 16:08:24 UTC
Created attachment 1897953 [details]
debug info from terminal

I did:
git bisect good kernel-5.18.10-0
git bisect bad kernel-5.18.11-0
from git https://gitlab.com/cki-project/kernel-ark.git

git bisect start
# bad: [974e3e09ea6fdef8a1b1b68b209ce849f05e0063] Turn on configs for retbleed
git bisect bad 974e3e09ea6fdef8a1b1b68b209ce849f05e0063
# good: [934aec1d7c14549447ba3b53a65f0eb948fc023b] [redhat] kernel-5.18.10-0
git bisect good 934aec1d7c14549447ba3b53a65f0eb948fc023b
# good: [2783414e6ef725bac946dc5d4d9288e34b6f5a13] ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported
git bisect good 2783414e6ef725bac946dc5d4d9288e34b6f5a13
# good: [5aca0c5b86a52e2487c4d846ac08f20d5fb9ce11] x86/vsyscall_emu/64: Don't use RET in vsyscall emulation
git bisect good 5aca0c5b86a52e2487c4d846ac08f20d5fb9ce11
# bad: [9072ecef88a18bba73dd59c78d202c9966574aab] x86/cpu/amd: Add Spectral Chicken
git bisect bad 9072ecef88a18bba73dd59c78d202c9966574aab
# bad: [b6755754d19816815235c8fca8979856763afbc9] x86/bugs: Optimize SPEC_CTRL MSR writes
git bisect bad b6755754d19816815235c8fca8979856763afbc9
# bad: [98db9034780970f94cf0fd66f6c3371ce5bd1da0] x86: Add magic AMD return-thunk
git bisect bad 98db9034780970f94cf0fd66f6c3371ce5bd1da0
# bad: [b881f755be2f276dca2ff2563d5cc4ae38561c51] x86: Use return-thunk in asm code
git bisect bad b881f755be2f276dca2ff2563d5cc4ae38561c51
# good: [9b9b256ca2665c776a56acd643e8e90d7c8ad1b4] x86/sev: Avoid using __x86_return_thunk
git bisect good 9b9b256ca2665c776a56acd643e8e90d7c8ad1b4
# first bad commit: [b881f755be2f276dca2ff2563d5cc4ae38561c51] x86: Use return-thunk in asm code


So according to bisect, this commit is causing the crash:
b881f755be2f276dca2ff2563d5cc4ae38561c51
x86: Use return-thunk in asm code

At least for my laptop with i5-8365U CPU

I compiled the kernel with make binrpm-pkg

You simply cannot revert this commit using git revert.

During the tests I caught the kernel panic on camera:
fastop
x86_emulate_insn
x86_emulate_instruction
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
__seccomp_filter
__x64_sus_ioctl
__x64_sus_ioctl
syscall_exit_to_user_mode
do_syscall_64

More in the attachments.

So it looks like it's not a fedora bug but a kernel bug in general. I should probably file it on kernel.org bugzilla.

Thank you

Comment 2 Zdeněk Rampas 2022-07-18 16:09:28 UTC
Created attachment 1897954 [details]
debug info from terminal 2

Comment 3 Zdeněk Rampas 2022-07-18 16:09:53 UTC
Created attachment 1897955 [details]
debug info from terminal 3

Comment 4 Zdeněk Rampas 2022-07-19 17:55:39 UTC
Created attachment 1898168 [details]
Netconsole kernel panic after W7 is started

I tried to compile vanilla kernel 5.18.11 with configuration from fedora 36 5.18.11-200. (.config is quite different from fedora, and probably mitigations for retbleed are not enabled). Did not trigger kernel panic. However I managed to get the whole kernel panic via netconsole (attached).

Comment 5 Sam Varshavchik 2022-07-19 21:44:04 UTC
I'm seeing this too. In my case, kernel-5.18.11-200.fc36.x86_64 reliably borks on one particular hardware. On different hardware it starts fine, so there's a hardware component involved here.

Comment 6 Stephen Sheldon 2022-07-21 20:39:57 UTC
I have also experienced a hard lockup with kernel-5.18.11-200.fc36.x86_64 after I started a VM with VMM.  This is on a 15-year old system with a Intel core2duo processor,a Nvidia Geforce GT 630 video card using the nouveau driver, and 8 gig of memory. I also have a 2-year old Acer Aspire system with a Intel Core-I5 processor and the integrated Intel graphics.  I do not have the lockup there.

Comment 7 Vinicius 2022-07-22 21:09:06 UTC
kernel-5.18.13-200.fc36 fixed for me.

Comment 8 Stephen Sheldon 2022-07-22 21:55:54 UTC
This issue went way for me with the kernel-5.18.13-200.fc36, which I found in koji.

Comment 9 Zdeněk Rampas 2022-07-22 22:52:11 UTC
Yep, I can confirm 5.18.13-200.fc36 is OK for all my VMs. No hard freeze yet. Hope it stays this way :-)

Comment 10 Zdeněk Rampas 2022-07-31 08:14:22 UTC
5.18.15-200.fc36.x86_64 from koji works as well. Closing this bug, thank you


Note You need to log in before you can comment on or make changes to this bug.