1. Please describe the problem: I'm running f36 virtual machines on macOS using Apple's virtualization framework ( https://developer.apple.com/documentation/virtualization ). Everything was working fine both on x86_64 and M1 macbooks until recently. Starting with kernel 5.19 on aarch64 (M1), and kernel 6.0 on x86_64, my VMs fail to start very early in the boot process. And unfortunately, this is about all the information I can provide in this bug report. With the virtualization framework, I only start getting logs fairly in the boot process (around the time udev enumerate devices/load modules), and the VM dies long before this. The only information I'm getting from Apple's APIs is "vz.VirtualizationError" with no additional logs. I'll also open a ticket with Apple, but maybe there are recent kernel/build changes which could explain this sudden failure? Fwiw, the same kernel boots fine using qemu + Apple's hypervisor framework (which is different - lower level - from Apple's virtualization framework). 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Reproducing the issue requires a macbook machine. Then you can get a kernel/initrd from a fedora VM, and use https://github.com/evansm7/vftool to try to start a VM using this kernel/initrd (preferrably on x86_64 - on a m1 you need an uncompressed kernel). 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: I tried vmlinuz-6.1.0-0.rc2.21.fc38.x86_64 and it also happens with this kernel version 6. Are you running any modules that not shipped with directly Fedora's kernel?: I'm not using any additional modules 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. As explained above, I unfortunately could not get any logs :-/
I managed to find some logs in macOS Console application. The failure on Intel macbooks is: Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Termination Signal: Illegal instruction: 4 Termination Reason: Namespace SIGNAL, Code 0x4 Terminating Process: exc handler [2265] Thread 5 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x000070000044a758 rcx: 0x0000000000000000 rdx: 0x0000000000000000 rdi: 0x0000000000000000 rsi: 0x000000001f08000c rbp: 0x000070000044a670 rsp: 0x000070000044a660 r8: 0x0000000000000000 r9: 0x0000000000000000 r10: 0x0000000000000000 r11: 0x0000000000000000 r12: 0x0000000000000002 r13: 0x00007f853480e000 r14: 0x00007f8533f06e70 r15: 0x000000000000ffff rip: 0x0000000107bf5fa7 rfl: 0x0000000000010206 cr2: 0x0000000109c3f000 Logical CPU: 2 Error Code: 0x00000000 Trap Number: 6 Thread 5 instruction stream: 01 74 09 48 8b 7d f0 e8-a4 04 00 00 48 89 df e8 .t.H.}......H... 40 03 00 00 0f 0b 66 2e-0f 1f 84 00 00 00 00 00 @.....f......... 90 90 90 90 55 48 89 e5-41 56 53 48 89 fb 0f b6 ....UH..AVSH.... 17 4c 8d 77 01 f6 c2 01-74 0a 48 8b 73 10 48 8b .L.w....t.H.s.H. 53 08 eb 06 48 d1 ea 4c-89 f6 48 8b 3d 9f 80 01 S...H..L..H.=... 00 e8 3a fe f5 ff f6 03-01 74 04 4c 8b 73 10 4c ..:......t.L.s.L 89 f7 e8 a9 04 00 00[0f]0b 55 48 89 e5 53 48 81 .........UH..SH. <== ec 08 01 00 00 49 89 fa-84 c0 74 2c 0f 29 85 40 .....I....t,.).@ ff ff ff 0f 29 8d 50 ff-ff ff 0f 29 95 60 ff ff ....).P....).`.. ff 0f 29 9d 70 ff ff ff-0f 29 65 80 0f 29 6d 90 ..).p....)e..)m. 0f 29 75 a0 0f 29 7d b0-48 8d 85 10 ff ff ff 48 .)u..)}.H......H 89 70 08 48 89 50 10 48-89 48 18 4c 89 40 20 4c .p.H.P.H.H.L.@ L Thread 5 last branch register state not available.
The arm64 crash is slightly different: Exception Type: EXC_BREAKPOINT (SIGTRAP) Exception Codes: 0x0000000000000001, 0x0000000100acfef8 Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace SIGNAL, Code 5 Trace/BPT trap: 5 Terminating Process: exc handler [4064] Thread 9 crashed with ARM Thread State (64-bit): x0: 0x0000000000000000 x1: 0x0000000000000000 x2: 0x0000000000000000 x3: 0x0000000000000000 x4: 0x0000000000000000 x5: 0x0000000000000000 x6: 0x0000000000000000 x7: 0x0000000000000000 x8: 0xedfbad7054b9004c x9: 0xedfbad7054b9004c x10: 0xaf957ffe418fd437 x11: 0x0000000000400000 x12: 0x0000000000400000 x13: 0x0000000000c00001 x14: 0x0000000028cd9729 x15: 0x0000000000f9ef58 x16: 0xfffffffffffffff4 x17: 0x0000000218a17f10 x18: 0x0000000000000000 x19: 0x000000016f946bd8 x20: 0x000000016f946c38 x21: 0x000000013df17f10 x22: 0x000000000000c025 x23: 0x0000000000000020 x24: 0x000000000000e801 x25: 0x0000000000074009 x26: 0x000000013df17e98 x27: 0x000000016f946dd0 x28: 0x00000000623a0049 fp: 0x000000016f946bc0 lr: 0xad27800100acfef8 sp: 0x000000016f946ba0 pc: 0x0000000100acfef8 cpsr: 0x60001000 far: 0x0000000100d50000 esr: 0xf2000001 (Breakpoint) brk 1 Still no kernel-side details :-/
Created attachment 1920483 [details] logs for the amd64 crash
Created attachment 1920484 [details] logs for the arm64 crash
I bisected this problem to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd514e58f12b211d638dbf6f791fa18d854f09c , at least for x86_64/macOS11 machines. If I revert this patch and rebuild the latest fedora kernel, my virtual machine successfully boots.
Can you please attach `lspci -vv` without the offending commit?
I built upstream commit 8f71a2b3f435 with "PCI: Clear PCI_STATUS when setting up device" reverted, and started a VM on my x86_64 macbook. lspci -vv is: 00:00.0 Host bridge: Apple Inc. Device f020 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Subsystem: Red Hat, Inc. Device 0041 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c0000100 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c00003c0 (32-bit, non-prefetchable) [size=32] Region 2: Memory at c00003e0 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c00003f0 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000140 (32-bit, non-prefetchable) [size=64] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:05.0 Communication controller: Red Hat, Inc. Virtio console (rev 01) Subsystem: Red Hat, Inc. Device 0043 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c0000180 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c0000400 (32-bit, non-prefetchable) [size=16] Region 2: Memory at c0000410 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c0000420 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000000 (32-bit, non-prefetchable) [size=128] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:06.0 Mass storage controller: Red Hat, Inc. Virtio block device (rev 01) Subsystem: Red Hat, Inc. Device 0042 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c00001c0 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c0000200 (32-bit, non-prefetchable) [size=64] Region 2: Memory at c0000430 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c0000440 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000240 (32-bit, non-prefetchable) [size=64] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:07.0 Communication controller: Red Hat, Inc. Virtio socket (rev 01) Subsystem: Red Hat, Inc. Device 0053 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c0000280 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c0000450 (32-bit, non-prefetchable) [size=16] Region 2: Memory at c0000460 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c0000470 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000080 (32-bit, non-prefetchable) [size=128] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:08.0 Network and computing encryption device: Red Hat, Inc. Virtio RNG (rev 01) Subsystem: Red Hat, Inc. Device 0044 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c00002c0 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c0000480 (32-bit, non-prefetchable) [size=16] Region 2: Memory at c0000490 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c00004a0 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000300 (32-bit, non-prefetchable) [size=64] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:09.0 Memory controller: Red Hat, Inc. Virtio memory balloon (rev 01) Subsystem: Red Hat, Inc. Device 0045 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin A routed to IRQ 0 Region 0: Memory at c0000340 (32-bit, non-prefetchable) [size=64] Region 1: Memory at c00004b0 (32-bit, non-prefetchable) [size=16] Region 2: Memory at c00004c0 (32-bit, non-prefetchable) [size=16] Region 3: Memory at c00004d0 (32-bit, non-prefetchable) [size=16] Region 4: Memory at c0000380 (32-bit, non-prefetchable) [size=64] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller Subsystem: Intel Corporation Device 8086 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Kernel modules: lpc_ich
So "Cap" is flagged, can you please try to change the line to the following: "pci_write_config_word(dev, PCI_STATUS, 0xffef);"
The hypervisor still exits with an error with this change. I tried multiple variations of "pci_write_config_word(dev, PCI_STATUS, $CONST);", (0xff00, 0x00ff, 0x0007, 0x000c, ...), they all caused a failure to happen. Only `pci_write_config_word(dev, PCI_STATUS, 0x0);` allows me to start a VM.
Hmm, it feels like it's a bug in the VM stack? Seems like writing anything to PCI_STATUS is prohibited? Does `sudo setpci -s 00:1f.0 STATUS=0xffff` crash the VM too? Also please all the other devices too. Have you got any reply from Apple's bug tracker?
> Does `sudo setpci -s 00:1f.0 STATUS=0xffff` crash the VM too? Yes that also kills the VM :-/ > Also please all the other devices too. `setpci STATUS=0xffff` worked fine on the other devices > Have you got any reply from Apple's bug tracker? Not yet, last time I did that, it took them a few weeks to answer. However, in the mean time I tested macOS 13 which has been released a few weeks ago, and on this version I cannot reproduce this kernel issue, they must have fixed something in their hypervisor. I don't expect everyone will upgrade to macOS 13 right away, so it would still be nice to avoid this kernel regression for macOS 12 users.
OK, so seems like it's really a bug in their VM. Hopefully Apple can fix it in macOS 12 so the patch can be reinstated...