Bug 2137803

Summary: latest f36 kernels fail to boot on macOS hosts
Product: [Fedora] Fedora Reporter: Christophe Fergeau <cfergeau>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kaihengfeng, kernel-maint, lgoncalv, linville, masami256, mchehab, prkumar, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs for the amd64 crash
none
logs for the arm64 crash none

Description Christophe Fergeau 2022-10-26 09:26:19 UTC
1. Please describe the problem:

I'm running f36 virtual machines on macOS using Apple's virtualization framework ( https://developer.apple.com/documentation/virtualization ).
Everything was working fine both on x86_64 and M1 macbooks until recently.
Starting with kernel 5.19 on aarch64 (M1), and kernel 6.0 on x86_64, my VMs fail to start very early in the boot process.

And unfortunately, this is about all the information I can provide in this bug report. With the virtualization framework, I only start getting logs fairly in the boot process (around the time udev enumerate devices/load modules), and the VM dies long before this. The only information I'm getting from Apple's APIs is "vz.VirtualizationError" with no additional logs.

I'll also open a ticket with Apple, but maybe there are recent kernel/build changes which could explain this sudden failure?

Fwiw, the same kernel boots fine using qemu + Apple's hypervisor framework (which is different - lower level - from Apple's virtualization framework).

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Reproducing the issue requires a macbook machine. Then you can get a kernel/initrd from a fedora VM, and use https://github.com/evansm7/vftool to try to start a VM using this kernel/initrd (preferrably on x86_64 - on a m1 you need an uncompressed kernel).


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I tried vmlinuz-6.1.0-0.rc2.21.fc38.x86_64 and it also happens with this kernel version

6. Are you running any modules that not shipped with directly Fedora's kernel?:

I'm not using any additional modules

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

As explained above, I unfortunately could not get any logs :-/

Comment 1 Christophe Fergeau 2022-10-26 11:31:13 UTC
I managed to find some logs in macOS Console application.
The failure on Intel macbooks is:

Exception Type:                EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes:              0x0000000000000001, 0x0000000000000000
Exception Note:                EXC_CORPSE_NOTIFY

Termination Signal:        Illegal instruction: 4
Termination Reason:        Namespace SIGNAL, Code 0x4
Terminating Process:      exc handler [2265]

Thread 5 crashed with X86 Thread State (64-bit):
    rax: 0x0000000000000000    rbx: 0x000070000044a758    rcx: 0x0000000000000000    rdx: 0x0000000000000000
    rdi: 0x0000000000000000    rsi: 0x000000001f08000c    rbp: 0x000070000044a670    rsp: 0x000070000044a660
      r8: 0x0000000000000000      r9: 0x0000000000000000    r10: 0x0000000000000000    r11: 0x0000000000000000
    r12: 0x0000000000000002    r13: 0x00007f853480e000    r14: 0x00007f8533f06e70    r15: 0x000000000000ffff
    rip: 0x0000000107bf5fa7    rfl: 0x0000000000010206    cr2: 0x0000000109c3f000
    
Logical CPU:          2
Error Code:            0x00000000
Trap Number:          6

Thread 5 instruction stream:
    01 74 09 48 8b 7d f0 e8-a4 04 00 00 48 89 df e8    .t.H.}......H...
    40 03 00 00 0f 0b 66 2e-0f 1f 84 00 00 00 00 00    @.....f.........
    90 90 90 90 55 48 89 e5-41 56 53 48 89 fb 0f b6    ....UH..AVSH....
    17 4c 8d 77 01 f6 c2 01-74 0a 48 8b 73 10 48 8b    .L.w....t.H.s.H.
    53 08 eb 06 48 d1 ea 4c-89 f6 48 8b 3d 9f 80 01    S...H..L..H.=...
    00 e8 3a fe f5 ff f6 03-01 74 04 4c 8b 73 10 4c    ..:......t.L.s.L
    89 f7 e8 a9 04 00 00[0f]0b 55 48 89 e5 53 48 81    .........UH..SH.	<==
    ec 08 01 00 00 49 89 fa-84 c0 74 2c 0f 29 85 40    .....I....t,.).@
    ff ff ff 0f 29 8d 50 ff-ff ff 0f 29 95 60 ff ff    ....).P....).`..
    ff 0f 29 9d 70 ff ff ff-0f 29 65 80 0f 29 6d 90    ..).p....)e..)m.
    0f 29 75 a0 0f 29 7d b0-48 8d 85 10 ff ff ff 48    .)u..)}.H......H
    89 70 08 48 89 50 10 48-89 48 18 4c 89 40 20 4c    .p.H.P.H.H.L.@ L
    
Thread 5 last branch register state not available.

Comment 2 Christophe Fergeau 2022-10-26 11:33:07 UTC
The arm64 crash is slightly different:

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000001, 0x0000000100acfef8
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    Namespace SIGNAL, Code 5 Trace/BPT trap: 5
Terminating Process:   exc handler [4064]

Thread 9 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x0000000000000000   x2: 0x0000000000000000   x3: 0x0000000000000000
    x4: 0x0000000000000000   x5: 0x0000000000000000   x6: 0x0000000000000000   x7: 0x0000000000000000
    x8: 0xedfbad7054b9004c   x9: 0xedfbad7054b9004c  x10: 0xaf957ffe418fd437  x11: 0x0000000000400000
   x12: 0x0000000000400000  x13: 0x0000000000c00001  x14: 0x0000000028cd9729  x15: 0x0000000000f9ef58
   x16: 0xfffffffffffffff4  x17: 0x0000000218a17f10  x18: 0x0000000000000000  x19: 0x000000016f946bd8
   x20: 0x000000016f946c38  x21: 0x000000013df17f10  x22: 0x000000000000c025  x23: 0x0000000000000020
   x24: 0x000000000000e801  x25: 0x0000000000074009  x26: 0x000000013df17e98  x27: 0x000000016f946dd0
   x28: 0x00000000623a0049   fp: 0x000000016f946bc0   lr: 0xad27800100acfef8
    sp: 0x000000016f946ba0   pc: 0x0000000100acfef8 cpsr: 0x60001000
   far: 0x0000000100d50000  esr: 0xf2000001 (Breakpoint) brk 1

Still no kernel-side details :-/

Comment 3 Christophe Fergeau 2022-10-26 11:33:48 UTC
Created attachment 1920483 [details]
logs for the amd64 crash

Comment 4 Christophe Fergeau 2022-10-26 11:34:16 UTC
Created attachment 1920484 [details]
logs for the arm64 crash

Comment 5 Christophe Fergeau 2022-11-04 11:52:16 UTC
I bisected this problem to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd514e58f12b211d638dbf6f791fa18d854f09c , at least for x86_64/macOS11 machines. If I revert this patch and rebuild the latest fedora kernel, my virtual machine successfully boots.

Comment 6 Kai-Heng Feng 2022-11-07 02:02:05 UTC
Can you please attach `lspci -vv` without the offending commit?

Comment 7 Christophe Fergeau 2022-11-07 16:49:02 UTC
I built upstream commit 8f71a2b3f435 with "PCI: Clear PCI_STATUS when setting up device" reverted, and started a VM on my x86_64 macbook. lspci -vv is:

00:00.0 Host bridge: Apple Inc. Device f020
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
	Subsystem: Red Hat, Inc. Device 0041
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c0000100 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c00003c0 (32-bit, non-prefetchable) [size=32]
	Region 2: Memory at c00003e0 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c00003f0 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000140 (32-bit, non-prefetchable) [size=64]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:05.0 Communication controller: Red Hat, Inc. Virtio console (rev 01)
	Subsystem: Red Hat, Inc. Device 0043
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c0000180 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c0000400 (32-bit, non-prefetchable) [size=16]
	Region 2: Memory at c0000410 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c0000420 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000000 (32-bit, non-prefetchable) [size=128]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:06.0 Mass storage controller: Red Hat, Inc. Virtio block device (rev 01)
	Subsystem: Red Hat, Inc. Device 0042
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c00001c0 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c0000200 (32-bit, non-prefetchable) [size=64]
	Region 2: Memory at c0000430 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c0000440 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000240 (32-bit, non-prefetchable) [size=64]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:07.0 Communication controller: Red Hat, Inc. Virtio socket (rev 01)
	Subsystem: Red Hat, Inc. Device 0053
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c0000280 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c0000450 (32-bit, non-prefetchable) [size=16]
	Region 2: Memory at c0000460 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c0000470 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000080 (32-bit, non-prefetchable) [size=128]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:08.0 Network and computing encryption device: Red Hat, Inc. Virtio RNG (rev 01)
	Subsystem: Red Hat, Inc. Device 0044
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c00002c0 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c0000480 (32-bit, non-prefetchable) [size=16]
	Region 2: Memory at c0000490 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c00004a0 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000300 (32-bit, non-prefetchable) [size=64]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:09.0 Memory controller: Red Hat, Inc. Virtio memory balloon (rev 01)
	Subsystem: Red Hat, Inc. Device 0045
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at c0000340 (32-bit, non-prefetchable) [size=64]
	Region 1: Memory at c00004b0 (32-bit, non-prefetchable) [size=16]
	Region 2: Memory at c00004c0 (32-bit, non-prefetchable) [size=16]
	Region 3: Memory at c00004d0 (32-bit, non-prefetchable) [size=16]
	Region 4: Memory at c0000380 (32-bit, non-prefetchable) [size=64]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller
	Subsystem: Intel Corporation Device 8086
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Kernel modules: lpc_ich

Comment 8 Kai-Heng Feng 2022-11-08 06:27:45 UTC
So "Cap" is flagged, can you please try to change the line to the following:
"pci_write_config_word(dev, PCI_STATUS, 0xffef);"

Comment 9 Christophe Fergeau 2022-11-08 13:09:00 UTC
The hypervisor still exits with an error with this change. I tried multiple variations of "pci_write_config_word(dev, PCI_STATUS, $CONST);", (0xff00, 0x00ff, 0x0007, 0x000c, ...), they all caused a failure to happen. Only `pci_write_config_word(dev, PCI_STATUS, 0x0);` allows me to start a VM.

Comment 10 Kai-Heng Feng 2022-11-09 12:34:46 UTC
Hmm, it feels like it's a bug in the VM stack? Seems like writing anything to PCI_STATUS is prohibited? 

Does `sudo setpci -s 00:1f.0 STATUS=0xffff` crash the VM too? Also please all the other devices too.

Have you got any reply from Apple's bug tracker?

Comment 11 Christophe Fergeau 2022-11-09 15:09:29 UTC
> Does `sudo setpci -s 00:1f.0 STATUS=0xffff` crash the VM too?

Yes that also kills the VM :-/

> Also please all the other devices too.

`setpci STATUS=0xffff` worked fine on the other devices

> Have you got any reply from Apple's bug tracker?

Not yet, last time I did that, it took them a few weeks to answer. However, in the mean time I tested macOS 13 which has been released a few weeks ago, and on this version I cannot reproduce this kernel issue, they must have fixed something in their hypervisor.
I don't expect everyone will upgrade to macOS 13 right away, so it would still be nice to avoid this kernel regression for macOS 12 users.

Comment 12 Kai-Heng Feng 2022-11-09 15:42:17 UTC
OK, so seems like it's really a bug in their VM. Hopefully Apple can fix it in macOS 12 so the patch can be reinstated...