Bug 2114858

Summary: Guest OS kernel crashes when isa-debugcon is enabled wIth OVMF
Product: [Fedora] Fedora Reporter: Daniel Berrangé <berrange>
Component: edk2Assignee: Gerd Hoffmann <kraxel>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: berrange, crobinso, kraxel, pbonzini, philmd, vgoyal, virt-maint, xuwei
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2149251 (view as bug list) Environment:
Last Closed: 2023-05-25 17:22:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2149251    

Description Daniel Berrangé 2022-08-03 11:56:45 UTC
Description of problem:

Attempting to boot a Linux guest using OVMF and the isa-debugcon results in the guest kernel panicking during boot up.

This doesn't happen if the Fedora OVMF binaries are replaced with those from RHEL-9, or those from qemu.git master

This doesn't happen if isa-debugcon is removed.

NB, I've only tested this with -kernel boot, not guest BIOS boot, so not sure if that's a factor or not.


Version-Release number of selected component (if applicable):
qemu-system-x86-6.2.0-12.fc36.x86_64
edk2-ovmf-20220526git16779ede2d36-3.fc36.noarch


How reproducible:
Always

Steps to Reproduce:
 $ git clone https://gitlab.com/berrange/tiny-vm-tools
 $ cd tiny-vm-tools
 $ ./make-tiny-image.py --run date date
 tiny-initrd.img
 Copy lib /lib/ld-musl-x86_64.so.1 -> /tmp/make-tiny-imagebcuv8i_b/lib/ld-musl-x86_64.so.1
 Copy bin /usr/bin/date -> /tmp/make-tiny-imagebcuv8i_b/bin/date
 Copy lib /lib64/libc.so.6 -> /tmp/make-tiny-imagebcuv8i_b/lib64/libc.so.6
 Copy lib /lib64/ld-linux-x86-64.so.2 -> /tmp/make-tiny-imagebcuv8i_b/lib64/ld-linux-x86-64.so.2

 $ cp /usr/share/edk2/ovmf/OVMF_VARS.fd vars.fd

 $ qemu-system-x86_64 \
   -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/edk2/ovmf/OVMF_CODE.fd,auto-read-only=on,discard=unmap \
   -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
   -blockdev node-name=file_ovmf_vars,driver=file,filename=vars.fd,auto-read-only=on,discard=unmap \
   -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars  \
   -machine pc,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
   -kernel /boot/vmlinuz-5.18.5-200.fc36.x86_64 \
   -initrd tiny-initrd.img \
   -m 8000 \
   -display none \
   -nodefaults \
   -serial stdio \
   -accel kvm \
   -append 'console=ttyS0 quiet'
   -chardev file,path=firmware.log,id=seabios \
   -device isa-debugcon,iobase=0x402,chardev=seabios 


Actual results:
[    0.068331] BUG: unable to handle page fault for address: 000000000080b000
[    0.068776] #PF: supervisor read access in kernel mode
[    0.068776] #PF: error_code(0x0000) - not-present page
[    0.068776] PGD 1001e9063 P4D 1001e9063 PUD 1001ff063 PMD 100200063 PTE 0
[    0.068776] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.068776] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.5-200.fc36.x86_64 #1
[    0.068776] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    0.068776] RIP: 0010:0xfffffffeff8c3615
[    0.068776] Code: 89 44 24 20 e8 2c 96 00 00 eb 08 ba f8 0c 00 00 89 d8 ef 48 83 c4 38 89 d8 5b 5d c3 55 48 89 e5 57 48 89 cf 56 53 48 83 ec 38 <80> 3c 25 00 b0 80 00 02 75 5b 4c 89 c3 49 8d 34 10 48 39 f3 74 5c
[    0.068776] RSP: 0000:ffffffff94e03970 EFLAGS: 00010282
[    0.068776] RAX: 0000000000000060 RBX: 0000000000000000 RCX: 0000000000000402
[    0.068776] RDX: 0000000000000060 RSI: ffffffff94e03c00 RDI: 0000000000000402
[    0.068776] RBP: ffffffff94e039c0 R08: ffffffff94e03a00 R09: 0000000000000002
[    0.068776] R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff94e03e90
[    0.068776] R13: ffffffff9420b1c0 R14: 0000000000000007 R15: 0000000000000000
[    0.068776] FS:  0000000000000000(0000) GS:ffff9eb16be00000(0000) knlGS:0000000000000000
[    0.068776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.068776] CR2: 000000000080b000 CR3: 0000000100222000 CR4: 00000000000006b0
[    0.068776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.068776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.068776] Call Trace:
[    0.068776]  <TASK>
[    0.068776]  ? post_alloc_hook+0xc8/0x150
[    0.068776]  ? post_alloc_hook+0xc8/0x150
[    0.068776]  ? get_page_from_freelist+0x55c/0x1520
[    0.068776]  ? get_page_from_freelist+0x55c/0x1520
[    0.068776]  ? __change_page_attr_set_clr+0xd65/0xf90
[    0.068776]  ? native_flush_tlb_global+0x79/0x90
[    0.068776]  ? __flush_tlb_all+0x30/0x40
[    0.068776]  ? kernel_map_pages_in_pgd+0xc2/0xce
[    0.068776]  ? __efi_call+0x25/0x30
[    0.068776]  ? virt_efi_set_variable_nonblocking+0xa5/0x120
[    0.068776]  ? efi_delete_dummy_variable+0x5c/0x80
[    0.068776]  ? efi_enter_virtual_mode+0x416/0x435
[    0.068776]  ? start_kernel+0x8cc/0x955
[    0.068776]  ? load_ucode_bsp+0x3a/0x100
[    0.068776]  ? secondary_startup_64_no_verify+0xd5/0xdb
[    0.068776]  </TASK>
[    0.068776] Modules linked in:
[    0.068776] CR2: 000000000080b000
[    0.068776] ---[ end trace 0000000000000000 ]---
[    0.068776] RIP: 0010:0xfffffffeff8c3615
[    0.068776] Code: 89 44 24 20 e8 2c 96 00 00 eb 08 ba f8 0c 00 00 89 d8 ef 48 83 c4 38 89 d8 5b 5d c3 55 48 89 e5 57 48 89 cf 56 53 48 83 ec 38 <80> 3c 25 00 b0 80 00 02 75 5b 4c 89 c3 49 8d 34 10 48 39 f3 74 5c
[    0.068776] RSP: 0000:ffffffff94e03970 EFLAGS: 00010282
[    0.068776] RAX: 0000000000000060 RBX: 0000000000000000 RCX: 0000000000000402
[    0.068776] RDX: 0000000000000060 RSI: ffffffff94e03c00 RDI: 0000000000000402
[    0.068776] RBP: ffffffff94e039c0 R08: ffffffff94e03a00 R09: 0000000000000002
[    0.068776] R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff94e03e90
[    0.068776] R13: ffffffff9420b1c0 R14: 0000000000000007 R15: 0000000000000000
[    0.068776] FS:  0000000000000000(0000) GS:ffff9eb16be00000(0000) knlGS:0000000000000000
[    0.068776] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.068776] CR2: 000000000080b000 CR3: 0000000100222000 CR4: 00000000000006b0
[    0.068776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.068776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.068776] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.068776] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
qemu-system-x86_64: terminating on signal 2


Expected results:
Wed Aug  3 11:55:46 UTC 2022
[    0.418779] reboot: Power down


Additional info:

Comment 1 Gerd Hoffmann 2022-08-17 10:36:44 UTC
Bisected.  The winner is "OvmfPkg: enable DEBUG_VERBOSE (RHEL only)" (aka patch12 in spec file).

The faulting address is 80b000 aka PcdOvmfWorkAreaBase.

So given this needs isa-debugcon to trigger I suspect the following is happening:

(1) linux kernel does a efi runtime call.
(2) ovmf wants log a debug message (enabled by the patch above).
(3) debugcon driver calls into iolib to send the characters.
(4) iolib checks WorkArea to figure whenever it can just use
    io instructions or must call some tdx/sev helper function for io.
(5) WorkArea has no runtime mapping -> BOOM.

Comment 4 Gerd Hoffmann 2022-12-01 09:04:09 UTC
Fixed upstream meanwhile, the rebase to 2022-11 should pick up the fix (and allow us to re-enable the patch flipping DEBUG_VERBOSE).

Comment 5 Ben Cotton 2023-04-25 17:42:29 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 6 Ludek Smid 2023-05-25 17:22:52 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.