Bug 2189136
Summary: | windows 11 installation broken with edk2-20230301gitf80f052277c8-1.el9 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Guo, Zhiyi <zhguo> | ||||||||
Component: | edk2 | Assignee: | Gerd Hoffmann <kraxel> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 9.3 | CC: | berrange, coli, demeng, echo, gveitmic, jinzhao, juzhang, kraxel, lersek, lkotek, mbrown, mdean, pbonzini, phou, qizhu, vgoyal, virt-maint, xuwei, yama, zhguo | ||||||||
Target Milestone: | rc | Keywords: | CustomerScenariosInitiative, Regression, Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Windows | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | edk2-20230301gitf80f052277c8-4.el9 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2023-11-07 08:24:29 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Guo, Zhiyi
2023-04-24 10:48:18 UTC
Created attachment 1959509 [details]
ovmf log
ovmf log show below information:
...
AtaError: 0
Tcg2SubmitCommand ...
PtpCrbTpmCommand Send - 80 01 00 00 00 14 00 00 01 7E 00 00 00 01 00 0B 03 00 08 00
PtpCrbTpmCommand ReceiveHeader - 80 01 00 00 00 3E 00 00 00 00
PtpCrbTpmCommand Receive - 80 01 00 00 00 3E 00 00 00 00 00 00 00 34 00 00 00 01 00 0B 03 00 08 00 00 00 00 01 00 20 69 D4 B0 81 B3 89 C1 A1 08 DA 1C 63 42 EB 24 8A FC 8F A2 99 31 26 E4 91 98 E4 D2 AB 36 57 59 8B
Tcg2SubmitCommand - Success
Tcg2SubmitCommand ...
PtpCrbTpmCommand Send - 80 02 00 00 00 41 00 00 01 82 00 00 00 0C 00 00 00 09 40 00 00 09 00 00 00 00 00 00 00 00 01 00 0B 9A 03 E3 85 CE D8 DF 9C CF 7A 45 E0 14 D4 F7 D1 ED 9B CB 95 C1 D0 56 E0 0C 2C 55 57 E9 A1 82 00
PtpCrbTpmCommand ReceiveHeader - 80 02 00 00 00 13 00 00 00 00
PtpCrbTpmCommand Receive - 80 02 00 00 00 13 00 00 00 00 00 00 00 00 00 00 01 00 00
Tcg2SubmitCommand - Success
Tcg2SubmitCommand ...
PtpCrbTpmCommand Send - 80 02 00 00 00 41 00 00 01 82 00 00 00 0D 00 00 00 09 40 00 00 09 00 00 00 00 00 00 00 00 01 00 0B 01 E7 FB AF F4 19 89 9D 13 88 12 E8 0D 24 32 B9 CF 95 E6 35 25 66 AE C7 28 45 EE 53 23 05 00 DB
PtpCrbTpmCommand ReceiveHeader - 80 02 00 00 00 13 00 00 00 00
PtpCrbTpmCommand Receive - 80 02 00 00 00 13 00 00 00 00 00 00 00 00 00 00 01 00 00
Tcg2SubmitCommand - Success
Tcg2SubmitCommand ...
PtpCrbTpmCommand Send - 80 02 00 00 00 41 00 00 01 82 00 00 00 0E 00 00 00 09 40 00 00 09 00 00 00 00 00 00 00 00 01 00 0B 56 B5 E2 61 1E 00 F4 75 C7 F1 C5 F1 EB 2D F5 17 3E 01 C6 10 99 42 37 75 B9 65 0E 9A 66 6E A4 A8
PtpCrbTpmCommand ReceiveHeader - 80 02 00 00 00 13 00 00 00 00
PtpCrbTpmCommand Receive - 80 02 00 00 00 13 00 00 00 00 00 00 00 00 00 00 01 00 00
Tcg2SubmitCommand - Success
ASSERT /builddir/build/BUILD/edk2-f80f052277c8/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c(42): InterruptedTPL < 31
Tried to reproduce, no success (slightly older win11 iso). (virt-install --vnc --name win11-bz2189136 --boot uefi --vcpus=4 --os-variant=win11 --cdrom ~/iso/Win11_German_x64.iso) Boots fine to the installer screen. Does this happen only with latest win11? Or does this happen with older versions (say win10) too? Can you try remove the options from the virt-install command line one by one to the bare minimum and see if onee of them changes the situation for you? (In reply to Gerd Hoffmann from comment #9) > Tried to reproduce, no success (slightly older win11 iso). > > (virt-install --vnc --name win11-bz2189136 --boot uefi --vcpus=4 > --os-variant=win11 --cdrom ~/iso/Win11_German_x64.iso) > > Boots fine to the installer screen. > > Does this happen only with latest win11? > Or does this happen with older versions (say win10) too? It looks like the issue specific to the version of windows 11 used by QE. I have tried the same virt-install command in comment 0 to windows 10, windows server 2019 and windows server 2022, none of them can reproduce the issue. > > Can you try remove the options from the virt-install command line one by one > to the bare minimum and see if onee of them changes the situation for you? Even the simple virt-install command can hit this issue: virt-install --name win11 --cpu host-model --ram=8192 --vcpus=4,sockets=1,cores=4,threads=1 --video virtio --boot uefi --disk path=/home/win11.qcow2,format='qcow2',cache=none,size=80,bus='sata' --graphics vnc,listen=0.0.0.0 --noreboot --os-variant=win10 --cdrom windows.iso upstream discussion https://edk2.groups.io/g/devel/message/103770 unbalanced RaiseTPL in DiskIo2ReadWriteDisk (In reply to Gerd Hoffmann from comment #21) > unbalanced RaiseTPL in DiskIo2ReadWriteDisk That's a good and interesting find: OldTpl = gBS->RaiseTPL (TPL_CALLBACK); ... gBS->RaiseTPL (TPL_NOTIFY); ... gBS->RestoreTPL (OldTpl); Arguments can be made for both the validity and the invalidity of this construct. One could be tempted to call this invalid because the second Raise (to NOTIFY) is not matched with a Restore. However, there are several arguments why this can be considered valid: - all "rising edges" are good (from application to callback to notify) - the one restore we have does correspond to a raise that we performed earlier - we do exit the function with the original TPL - when we lower (restore the TPL), the spec provides well-defined semantics for *all* functions that have been queued in the interim (see EFI_BOOT_SERVICES.CreateEvent()); so when we jump from NOTIFY to APPLICATION, the highest prio (= NOTIFY) functions will execute in FIFO order, then the next prio (= CALLBACK) ones, also in FIFO order. If the code did: OldTpl = gBS->RaiseTPL (TPL_CALLBACK); Old2Tpl = gBS->RaiseTPL (TPL_NOTIFY); gBS->RestoreTPL (Old2Tpl); gBS->RestoreTPL (OldTpl); that would be required to have the exact same effect (the first restore would unblock NOTIFY funcs, the second restore would unblock CALLBACK ones; assuming anyway that we entered the function at OldTpl = APPLICATION). This problem reproduces with commit c37cbc030d96 ("OvmfPkg: Switch timer in build time for OvmfPkg", 2022-04-02) reverted, on top of 2c2cb2352896. In other words, using the LAPIC timer, vs. the 8254 timer + the 8259 interrupt controller, makes no difference. Found some interesting bits. First, I reverted commits a086f4a63bc0 ("OvmfPkg: Use NestedInterruptTplLib in nested interrupt handlers", 2022-12-23) and a24fbd606125 ("OvmfPkg: Add library to handle TPL from within nested interrupt handlers", 2022-12-23) on top of upstream 2c2cb2352896, *and* modified LocalApicTimerDxe like this: > diff --git a/OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c b/OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c > index 3e04b49d6bcf..4b5b4dad8196 100644 > --- a/OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c > +++ b/OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c > @@ -61,6 +61,9 @@ TimerInterruptHandler ( > EFI_TPL OriginalTPL; > > OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL); > + if (OriginalTPL == TPL_HIGH_LEVEL) { > + DEBUG ((DEBUG_WARN, "%a:%d\n", __func__, __LINE__)); > + } > > SendApicEoi (); > Sure enough, this fires; that is, even before/without the nested interrupt handler stuff from Michael Brown, the same issue is triggered or caused by "cdboot.efi": > Loading driver at 0x00010000000 EntryPoint=0x00010001090 cdboot.efi > InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 3DEA4698 > ProtectUefiImageCommon - 0x3CEB9040 > - 0x0000000010000000 - 0x000000000011C000 > TimerInterruptHandler:65 Second, right after this snippet, "cdboot.efi" calls SerialSetAttributes() [MdeModulePkg/Bus/Pci/PciSioSerialDxe/SerialIo.c]. From there we get a call to VerifyUartParameters() [same file], which logs: > ClockRate = 1843200 > Divisor = 1 > BaudRate/Actual (115200/115200) = 100% Not much later "cbdoot.efi" calls gBS->ExitBootServices(), which we learn from log messages such as: > VirtioScsiExitBoot: Context=0x3CFF6498 > SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 Third, I copied "cboot.efi" out of the installer ISO, disassembled it with "objdump -d" (thankfully, on RHEL9, objdump directly supports EFI binaries!), and then looked for cli/sti instructions. Found two interesting spots. (a) This pair of spots is really small in itself: 100b8ef0: fb sti 100b8ef1: c3 retq and 100b8f00: fa cli 100b8f01: c3 retq What are interesting are the call sites! I'm not going to quote them in full, just the tight contexts: 1009beea: 44 8a f8 mov %al,%r15b 1009beed: 84 c0 test %al,%al 1009beef: 74 05 je 0x1009bef6 1009bef1: e8 0a d0 01 00 callq 0x100b8f00 1009bef6: 41 0f 20 c4 mov %cr0,%r12 This is effectively a conditional CLI. If the condition holds, we jump *just over* the subroutine call that would mean a CLI. In other words, "cdboot.efi" checks something, and then based on the result, it either executes CLI (in a tight subroutine) or not. 1009c133: 45 84 ff test %r15b,%r15b 1009c136: 74 05 je 0x1009c13d 1009c138: e8 b3 cd 01 00 callq 0x100b8ef0 1009c13d: 48 8b 6c 24 58 mov 0x58(%rsp),%rbp Exact same story, just with STI. The next one is so short I can quote in full: 1009cbfe: f6 42 10 02 testb $0x2,0x10(%rdx) 1009cc02: 74 05 je 0x1009cc09 -> jumps just over the CLI 1009cc04: e8 f7 c2 01 00 callq 0x100b8f00 -> CLI 1009cc09: 83 3b 01 cmpl $0x1,(%rbx) 1009cc0c: 75 16 jne 0x1009cc24 1009cc0e: 48 8b d7 mov %rdi,%rdx 1009cc11: 48 8b cb mov %rbx,%rcx 1009cc14: e8 3b 00 00 00 callq 0x1009cc54 1009cc19: 48 8d 4b 14 lea 0x14(%rbx),%rcx 1009cc1d: e8 2e c2 01 00 callq 0x100b8e50 1009cc22: eb 14 jmp 0x1009cc38 1009cc24: 48 8d 4b 14 lea 0x14(%rbx),%rcx 1009cc28: e8 23 c2 01 00 callq 0x100b8e50 1009cc2d: 48 8b d7 mov %rdi,%rdx 1009cc30: 48 8b cb mov %rbx,%rcx 1009cc33: e8 1c 00 00 00 callq 0x1009cc54 1009cc38: f6 43 10 02 testb $0x2,0x10(%rbx) 1009cc3c: 74 05 je 0x1009cc43 -> jumps just over the STI 1009cc3e: e8 ad c2 01 00 callq 0x100b8ef0 -> STI 1009cc43: 48 8b 5c 24 30 mov 0x30(%rsp),%rbx The same conditional patterns. (b) Regarding the other CLI+STI pair, the CLI is in a subroutine that has only one caller, and the STI is in a code snippet that is *not* called by the CALL instruction, but reached by other means (haven't figured that out for now): Entry to the subroutine with CLI (having one call site): 100b9020: 48 55 rex.W push %rbp 100b9022: 56 push %rsi 100b9023: 57 push %rdi 100b9024: 53 push %rbx 100b9025: 9c pushfq 100b9026: 83 3d 1b 30 04 00 00 cmpl $0x0,0x4301b(%rip) # 0x100fc048 100b902d: 74 01 je 0x100b9030 100b902f: fa cli 100b9030: 0f 20 c0 mov %cr0,%rax Note above the same pattern: a flag check (the flag is at 0x100fc048), and then the jump that *only* skips the CLI in case the flag equals zero. (BTW, the flag at 0x100fc048 is written-to in just one place in the disassembly; it's most probably a global variable. At least I think it is a "write" access; the GNU assembly style puts the destination operand second, right?) 100b9033: 50 push %rax 100b9034: 0f 20 e0 mov %cr4,%rax 100b9037: 50 push %rax 100b9038: 48 25 00 10 00 00 and $0x1000,%rax 100b903e: 48 c1 e8 0c shr $0xc,%rax 100b9042: 50 push %rax 100b9043: 0f 20 d8 mov %cr3,%rax 100b9046: 48 89 05 d3 2f 04 00 mov %rax,0x42fd3(%rip) # 0x100fc020 100b904d: 48 89 25 04 30 04 00 mov %rsp,0x43004(%rip) # 0x100fc058 100b9054: 48 8b 15 e5 2f 04 00 mov 0x42fe5(%rip),%rdx # 0x100fc040 100b905b: 0f 01 12 lgdt (%rdx) 100b905e: 0f 01 5a 0a lidt 0xa(%rdx) 100b9062: 0f 00 52 14 lldt 0x14(%rdx) So this is some really quirky code, reading CR4, CR3, and loading the global, local, and interrupt descriptor tables! 100b9066: 48 b8 79 90 0b 10 00 movabs $0x100b9079,%rax 100b906d: 00 00 00 Note this value -- 100b9079. 100b9070: 48 0f b7 4a 16 movzwq 0x16(%rdx),%rcx 100b9075: 51 push %rcx 100b9076: 50 push %rax 100b9077: 48 cb lretq So the subroutine with the (conditional) CLI in it ends here. It is followed immediately by some helper subroutine; listed below. Note its start address: 100b9079. 100b9079: 8e 5a 18 mov 0x18(%rdx),%ds 100b907c: 8e 42 1a mov 0x1a(%rdx),%es 100b907f: 8e 6a 1e mov 0x1e(%rdx),%gs 100b9082: 8e 62 1c mov 0x1c(%rdx),%fs 100b9085: 8e 52 20 mov 0x20(%rdx),%ss 100b9088: 48 8b 0d d9 2f 04 00 mov 0x42fd9(%rip),%rcx # 0x100fc068 100b908f: 48 8b 15 9a 2f 04 00 mov 0x42f9a(%rip),%rdx # 0x100fc030 100b9096: 48 8b 05 d3 2f 04 00 mov 0x42fd3(%rip),%rax # 0x100fc070 100b909d: 48 85 c0 test %rax,%rax 100b90a0: 74 04 je 0x100b90a6 100b90a2: ff d0 callq *%rax 100b90a4: eb 03 jmp 0x100b90a9 100b90a6: 0f 22 d9 mov %rcx,%cr3 100b90a9: 48 2b ed sub %rbp,%rbp 100b90ac: 48 8b 25 75 2f 04 00 mov 0x42f75(%rip),%rsp # 0x100fc028 100b90b3: 48 2b f6 sub %rsi,%rsi 100b90b6: 48 8b 0d 7b 2f 04 00 mov 0x42f7b(%rip),%rcx # 0x100fc038 100b90bd: 48 8b 05 8c 2f 04 00 mov 0x42f8c(%rip),%rax # 0x100fc050 100b90c4: ff d0 callq *%rax 100b90c6: 48 8b 25 8b 2f 04 00 mov 0x42f8b(%rip),%rsp # 0x100fc058 100b90cd: 48 8b 0d 4c 2f 04 00 mov 0x42f4c(%rip),%rcx # 0x100fc020 100b90d4: 5a pop %rdx 100b90d5: 48 8b 05 94 2f 04 00 mov 0x42f94(%rip),%rax # 0x100fc070 100b90dc: 48 85 c0 test %rax,%rax 100b90df: 74 04 je 0x100b90e5 100b90e1: ff d0 callq *%rax 100b90e3: eb 03 jmp 0x100b90e8 100b90e5: 0f 22 d9 mov %rcx,%cr3 100b90e8: 48 8b 15 71 2f 04 00 mov 0x42f71(%rip),%rdx # 0x100fc060 100b90ef: 0f 01 12 lgdt (%rdx) 100b90f2: 0f 01 5a 0a lidt 0xa(%rdx) 100b90f6: 0f 00 52 14 lldt 0x14(%rdx) Again the descriptor table loading stuff... 100b90fa: 48 b8 0d 91 0b 10 00 movabs $0x100b910d,%rax 100b9101: 00 00 00 Note this address: 100b910d. 100b9104: 48 0f b7 4a 16 movzwq 0x16(%rdx),%rcx 100b9109: 51 push %rcx 100b910a: 50 push %rax 100b910b: 48 cb lretq The helper subroutine ends here. And then right after follows the subroutine with STI in it -- note that it starts at 100b910d -- see the movabs above. 100b910d: 8e 5a 18 mov 0x18(%rdx),%ds 100b9110: 8e 42 1a mov 0x1a(%rdx),%es 100b9113: 8e 6a 1e mov 0x1e(%rdx),%gs 100b9116: 8e 62 1c mov 0x1c(%rdx),%fs 100b9119: 8e 52 20 mov 0x20(%rdx),%ss 100b911c: 58 pop %rax 100b911d: 0f 22 e0 mov %rax,%cr4 100b9120: 58 pop %rax 100b9121: 0f 22 c0 mov %rax,%cr0 100b9124: 83 3d 1d 2f 04 00 00 cmpl $0x0,0x42f1d(%rip) # 0x100fc048 100b912b: 74 01 je 0x100b912e 100b912d: fb sti 100b912e: 9d popfq Same pattern as in the subroutine with the CLI: if the flag at 0x100fc048 is zero, we jump *just over* the STI. Otherwise, if the flag is nonzero, we perform the STI. 100b912f: 5b pop %rbx 100b9130: 5f pop %rdi 100b9131: 5e pop %rsi 100b9132: 5d pop %rbp 100b9133: c3 retq End of the STI-containing subroutine. Here's my take: functionality (b) seems way too complex for running before ExitBootServces()! It manipulates control registers and descriptor tables. No sane UEFI application would do that before calling ExitBootServices(), I presume. Functionality (a) seems more likely, except I don't know in the slightest what it does. The interesting question is now to disassemble "cdboot.efi" from a previous (non-problematic) Windows installer ISO, and see whether that version contains similar CLI/STI pairs. Testing cdboot.efi from
"en_windows_10_enterprise_ltsc_2019_x64_dvd_be3c8ffb.iso" (more or less
randomly picked):
> Loading driver at 0x00010000000 EntryPoint=0x00010001090 cdboot.efi
> InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 3DEA4698
> ProtectUefiImageCommon - 0x3CEB8040
> - 0x0000000010000000 - 0x0000000000152000
> ClockRate = 1843200
> Divisor = 1
> BaudRate/Actual (115200/115200) = 100%
> VirtioScsiExitBoot: Context=0x3CFF5018
> SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0
Note the lack of "TimerInterruptHandler:65" just before "ClockRate".
However, in the disassembly, I find the exact same CLI/STI patterns.
Next idea: what if edk2 services are used for reenabling interrupts, while at TPL_HIGH_LEVEL? - modify the DXE core to reflect all changes to gEfiCurrentTpl to a predefined fixed physical address - modify EnableInterrupts() [MdePkg/Library/BaseLib/X64/GccInlinePriv.c] to check the current TPL; complain if it is already HIGH. (EnableInterrupts() is the BaseLib interface that is called either directly by low-level drivers or via CPU protocol interfaces via higher-level drivers.) > Entry to the subroutine with CLI (having one call site): > > 100b9020: 48 55 rex.W push %rbp > 100b9022: 56 push %rsi > 100b9023: 57 push %rdi > 100b9024: 53 push %rbx > 100b9025: 9c pushfq > 100b9026: 83 3d 1b 30 04 00 00 cmpl $0x0,0x4301b(%rip) # > 0x100fc048 > 100b902d: 74 01 je 0x100b9030 > 100b902f: fa cli > 100b9030: 0f 20 c0 mov %cr0,%rax When setting breakpoints at all four cli/sti instructions this is the only one which fires. > 100b9033: 50 push %rax > 100b9034: 0f 20 e0 mov %cr4,%rax > 100b9037: 50 push %rax > 100b9038: 48 25 00 10 00 00 and $0x1000,%rax > 100b903e: 48 c1 e8 0c shr $0xc,%rax > 100b9042: 50 push %rax > 100b9043: 0f 20 d8 mov %cr3,%rax > 100b9046: 48 89 05 d3 2f 04 00 mov %rax,0x42fd3(%rip) # > 0x100fc020 > 100b904d: 48 89 25 04 30 04 00 mov %rsp,0x43004(%rip) # > 0x100fc058 > 100b9054: 48 8b 15 e5 2f 04 00 mov 0x42fe5(%rip),%rdx # > 0x100fc040 > 100b905b: 0f 01 12 lgdt (%rdx) > 100b905e: 0f 01 5a 0a lidt 0xa(%rdx) > 100b9062: 0f 00 52 14 lldt 0x14(%rdx) > > So this is some really quirky code, reading CR4, CR3, and loading the > global, local, and interrupt descriptor tables! Really strange cdboot seems to do all this way before calling exitbootservices. > 100b90fa: 48 b8 0d 91 0b 10 00 movabs $0x100b910d,%rax > 100b9101: 00 00 00 > > Note this address: 100b910d. > > 100b9104: 48 0f b7 4a 16 movzwq 0x16(%rdx),%rcx > 100b9109: 51 push %rcx > 100b910a: 50 push %rax > 100b910b: 48 cb lretq > > The helper subroutine ends here. No, this is just a jmp $0x100b910d (push target to stack and ret). > > 100b90fa: 48 b8 0d 91 0b 10 00 movabs $0x100b910d,%rax
> > 100b9101: 00 00 00
> >
> > Note this address: 100b910d.
> >
> > 100b9104: 48 0f b7 4a 16 movzwq 0x16(%rdx),%rcx
> > 100b9109: 51 push %rcx
> > 100b910a: 50 push %rax
> > 100b910b: 48 cb lretq
> >
> > The helper subroutine ends here.
>
> No, this is just a jmp $0x100b910d (push target to stack and ret).
Well, almost, not clear whenever %rxc is zero at that point.
Maybe that is part of self-relocation.
(In reply to Laszlo Ersek from comment #27) > Next idea: what if edk2 services are used for reenabling interrupts, while > at TPL_HIGH_LEVEL? Also note that using sti/cli is not the only way to enable/disable interrupts, it's just a bit in the flags register, so a pushf + raisetpl(high) + popf could also lead to IRQs being enabled at TPL=HIGH. Thanks for the corrections! :) Didn't realize the jump-by-ret trick was in used here; I thought that was only necessary for CPU mode switches. Regarding popf... That sounds horrible. I'm unsure if we can ever catch that. There are probably too many popfs for us to set breakpoints on all of them. It's interesting though that from the explicit instructions, only one CLI is reached (repeatedly perhaps, but still). > > No, this is just a jmp $0x100b910d (push target to stack and ret).
>
> Well, almost, not clear whenever %rxc is zero at that point.
> Maybe that is part of self-relocation.
Self-relocation would also explain why that sti breakpoint never triggers.
(In reply to Gerd Hoffmann from comment #32) > > > No, this is just a jmp $0x100b910d (push target to stack and ret). > > > > Well, almost, not clear whenever %rxc is zero at that point. > > Maybe that is part of self-relocation. > > Self-relocation would also explain why that sti breakpoint never triggers. Hmm, trying to set a breakpoint at the lretq instruction doesn't work either. But doesn't matter much, that is most likely re-enabling interrupts after calling exitbootservices and setting up everything (specifically IRQ handlers) on the boot cpu. Printed some random junk to the (slow) debug log while IRQs are disabled to let some time pass. Had the effect that the timer IRQ was already pending when interrupts are enabled, so it instantly fired and ran into the ASSERT(). Now we can inspect where we are: (1) Walk up the stack to TimerInterruptHandler. (2) print /x *SystemContext.SystemContextX64 $2 = {ExceptionData = 0x0, FxSaveState = {Fcw = 0x37f, Fsw = 0x0, Ftw = 0x0, Opcode = 0x0, [ ... ] Rflags = 0x202, Ldtr = 0x0, Tr = 0x0, Gdtr = {0x7f7dc000, 0x47}, Idtr = {0x7ee93018, 0xfff}, Rip = 0x235579b, Gs = 0x30, Fs = 0x30, Es = 0x30, Ds = 0x30, Cs = 0x38, Ss = 0x30, Rdi = 0x2608000, Rsi = 0x243f90, Rbp = 0xa, Rsp = 0x243d20, Rbx = 0x2607fc0, Rdx = 0x2569e3e, Rcx = 0x2607fd4, Rax = 0x38, R8 = 0x7fe3f2ce, R9 = 0x0, R10 = 0x200, R11 = 0xffffffffff000, R12 = 0x24403c, R13 = 0x0, R14 = 0x1, R15 = 0x4} (3) x/10i 0x2355790 0x2355790: testb $0x2,0x10(%rbx) 0x2355794: je 0x235579b 0x2355796: call 0x2569ec0 0x235579b: mov 0x30(%rsp),%rbx <== RIP is here. 0x23557a0: mov 0x38(%rsp),%rsi (4) x/10i 0x2569ec0 0x2569ec0: sti 0x2569ec1: ret So apparently a tiny sti() helper function, simliar to the one found in cdboot.efi But it's not the location where cdboot.efi was loaded to. So either cdboot.efi relocated itself, or it loaded stuff from cdrom and jumped to it. The latter is IMHO more likely. Created attachment 1961973 [details] edk2 debug patch on top of commit 2c2cb2352896 (In reply to Laszlo Ersek from comment #27) > Next idea: what if edk2 services are used for reenabling interrupts, while > at TPL_HIGH_LEVEL? > > - modify the DXE core to reflect all changes to gEfiCurrentTpl to a > predefined fixed physical address > > - modify EnableInterrupts() [MdePkg/Library/BaseLib/X64/GccInlinePriv.c] to > check the current TPL; complain if it is already HIGH. (EnableInterrupts() > is the BaseLib interface that is called either directly by low-level drivers > or via CPU protocol interfaces via higher-level drivers.) The attached debug patch does this (more or less... more precisely, the debug patch exposes the DXE Core's own gEfiCurrentTpl variable to other modules). Unfortunately, this patch does *not* fire, even though booting Windows reproduces the symptom. (I did test the patch forcibly, by raising the TPL to HIGH manually, and then manually enabling interrupts. The patch surely fired then.) So in whatever way Windows reenables interrupts while the TPL is HIGH, Windows does not go through any edk2 service that terminates in BaseLib's EnableInterrupts() function. scratch build with workaround (still building atm): https://kojihub.stream.centos.org/koji/taskinfo?taskID=2176907 (In reply to Gerd Hoffmann from comment #34) > Printed some random junk to the (slow) debug log while IRQs are > disabled to let some time pass. Had the effect that the timer IRQ was > already pending when interrupts are enabled, so it instantly fired and > ran into the ASSERT(). Ah, beautiful -- it turns the STI into a synchronous interrupt, effectively. > > Now we can inspect where we are: > > (1) Walk up the stack to TimerInterruptHandler. > (2) print /x *SystemContext.SystemContextX64 > > $2 = {ExceptionData = 0x0, FxSaveState = {Fcw = 0x37f, Fsw = 0x0, Ftw = 0x0, > Opcode = 0x0, > [ ... ] > Rflags = 0x202, Ldtr = 0x0, Tr = 0x0, Gdtr = {0x7f7dc000, 0x47}, Idtr = > {0x7ee93018, 0xfff}, > Rip = 0x235579b, Gs = 0x30, Fs = 0x30, Es = 0x30, Ds = 0x30, Cs = 0x38, Ss > = 0x30, > Rdi = 0x2608000, Rsi = 0x243f90, Rbp = 0xa, Rsp = 0x243d20, Rbx = > 0x2607fc0, Rdx = 0x2569e3e, > Rcx = 0x2607fd4, Rax = 0x38, R8 = 0x7fe3f2ce, R9 = 0x0, R10 = 0x200, R11 = > 0xffffffffff000, > R12 = 0x24403c, R13 = 0x0, R14 = 0x1, R15 = 0x4} > > (3) x/10i 0x2355790 > > 0x2355790: testb $0x2,0x10(%rbx) > 0x2355794: je 0x235579b > 0x2355796: call 0x2569ec0 > 0x235579b: mov 0x30(%rsp),%rbx <== RIP is here. > 0x23557a0: mov 0x38(%rsp),%rsi > > (4) x/10i 0x2569ec0 > > 0x2569ec0: sti > 0x2569ec1: ret > > So apparently a tiny sti() helper function, simliar to the one found > in cdboot.efi But it's not the location where cdboot.efi was loaded > to. So either cdboot.efi relocated itself, or it loaded stuff from > cdrom and jumped to it. The latter is IMHO more likely. Both ideas are consistent with what I'm seeing. I've created another small debug patch: > diff --git a/MdeModulePkg/Core/Dxe/Misc/Stall.c b/MdeModulePkg/Core/Dxe/Misc/Stall.c > index 35e045d41a6c..fc371bac3769 100644 > --- a/MdeModulePkg/Core/Dxe/Misc/Stall.c > +++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c > @@ -54,6 +54,16 @@ CoreStall ( > UINT32 Remainder; > UINTN Index; > > + if (gEfiCurrentTpl == TPL_HIGH_LEVEL) { > + DEBUG (( > + DEBUG_WARN, > + "%a:%d:%p\n", > + __func__, > + __LINE__, > + (VOID *)&Microseconds > + )); > + } > + > if (gMetronome == NULL) { > return EFI_NOT_AVAILABLE_YET; > } It basically logs all gBS->Stall() calls that are made at TPL_HIGH_LEVEL. (This debug patch is easy because the DXE Core both implements CoreStall() and defines gEfiCurrentTpl, so the former can access the latter.) The log message also includes the address of the gBS->Stall() parameter "Microseconds". The idea with that is to glean some minimal insight into the stack upon which gBS->Stall() is called. When I boot "en_windows_10_enterprise_ltsc_2019_x64_dvd_be3c8ffb.iso" (successfully to the first dialog of the Windows installer), there are *zero* such log messages. When I boot the problematic ISO, I get the following: > Loading driver at 0x00010000000 EntryPoint=0x00010001090 cdboot.efi > InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 3DEA1698 > ProtectUefiImageCommon - 0x3CEBC040 > - 0x0000000010000000 - 0x000000000011C000 > CoreStall:62:3EF13810 > CoreStall:62:3EF13810 > CoreStall:62:3EF13810 These are the first such messages in the log; that is, nothing other than cdboot.efi calls Stall() at TPL_HIGH_LEVEL (which is BTW permitted by the UEFI spec). These work just fine. Also note that the stack seems to be the "normal" edk2 stack. Yet further, let's record the fact that the new Windows ISO differs from the old ISO in that the new one calls Stall() at TPL_HIGH_LEVEL, while the old one doesn't. Again, calling Stall() like this is permitted by the spec, what I don't understand is how Windows (cdboot.efi) gets to run *any* code at TPL_HIGH_LEVEL, including a function call to gBS->Stall(). This suggests that "cdboot.efi" registers a notification function at TPL_HIGH_LEVEL, which on the other hand would be a violation of the spec. In fact it doesn't look possible; CoreCreateEventEx() [MdeModulePkg/Core/Dxe/Event/Event.c] seems to enforce valid TPLs. Anyway, these messages above are logged while "cdboot.efi" is displaying the growing string of dots, after the prompt ("press any key to boot from DVD" or some such). If I don't press a key and "cdboot.efi" times out, then we don't get other types of CoreStall messages. However, when I press Enter for example, the log continues as follows, with the stack address changed: > CoreStall:62:136D00 > CoreStall:62:136D00 > CoreStall:62:136D00 > CoreStall:62:241D80 > CoreStall:62:241D80 > TimerInterruptHandler:65 > CoreStall:62:241D80 > ClockRate = 1843200 > Divisor = 1 > BaudRate/Actual (115200/115200) = 100% > VirtioScsiExitBoot: Context=0x3CED0018 This indicates that the same loop somewhere in Windows continues, or that the same (impossible?) TPL_HIGH_LEVEL notification function continues to be queued and executed, but this time it keeps interrupting a very different call stack. And then even *that* stack changes quite significantly. And then we see the interrupt being delivered while the TPL is HIGH, there's *another* Stall() call, then we see SerialSetAttributes(), and finally ExitBootServices(). It's inexplicable *whence* we get those gBS->Stall() calls, at TPL_HIGH_LEVEL. Hm, I might be able to check into that with gdb, in fact; just replace the log with a CpuDeadLoop(), and get a backtrace? > Hm, I might be able to check into that with gdb, in fact; just replace
> the log with a CpuDeadLoop(), and get a backtrace?
My process: boot, let cdboot.efi timeout, wait for efi shell prompt,
attach gdb. All firmware modules are loaded at that point, and RIP
is somewhere in edk2, the edk2 gdb scripts manage to automatically
load the complete debug info in this state.
Now set breakpoints + continue. In efi shell start cdboot.efi again
and watch.
For monitoring specific conditions its useful to call a empty void
function, just for being able to easily set breakpoints, like this:
--- a/MdeModulePkg/Core/Dxe/Misc/Stall.c
+++ b/MdeModulePkg/Core/Dxe/Misc/Stall.c
@@ -44,6 +44,9 @@ CoreInternalWaitForTick (
@retval EFI_NOT_AVAILABLE_YET gMetronome is not available yet
**/
+
+void CoreStallBT(void) {}
+
EFI_STATUS
EFIAPI
CoreStall (
@@ -58,6 +61,10 @@ CoreStall (
return EFI_NOT_AVAILABLE_YET;
}
+ if (gEfiCurrentTpl == TPL_HIGH_LEVEL) {
+ CoreStallBT();
+ }
+
//
// Counter = Microseconds * 10 / gMetronome->TickPeriod
// 0x1999999999999999 = (2^64 - 1) / 10
to be combined with a gdb script:
break CoreStallBT
commands
bt
cont
end
First there are a few stacktraces like this:
Breakpoint 1, CoreStallBT () at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:48
48 void CoreStallBT(void) {}
#0 CoreStallBT () at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:48
#1 0x000000007fe063f9 in CoreStall (Microseconds=9000)
at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:65
#2 0x00000000016459dc in ?? ()
#3 0x0000000000002328 in ?? ()
#4 0x000000017e170020 in ?? ()
#5 0x0000000000137f50 in ?? ()
#6 0x0000000000138000 in ?? ()
#7 0x00000021cfd76d3c in ?? ()
#8 0x000000000177ab75 in ?? ()
#9 0x00000021d40c8998 in ?? ()
#10 0x00000000017b7fe2 in ?? ()
#11 0x000000007e4a5100 in mConIn ()
#12 0x0000000000137dc0 in ?? ()
#13 0x00000000001383c6 in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000000000138000 in ?? ()
#16 0x0000000001779ae4 in ?? ()
#17 0x0000000000000000 in ?? ()
This is probably cdboot.efi waiting for a keypress.
Finally once this (before hitting NestedInterruptRaiseTPL ASSERT())
Breakpoint 1, CoreStallBT () at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:48
48 void CoreStallBT(void) {}
#0 CoreStallBT () at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:48
#1 0x000000007fe063f9 in CoreStall (Microseconds=1)
at /home/kraxel/projects/edk2/MdeModulePkg/Core/Dxe/Misc/Stall.c:65
#2 0x000000000235824c in ?? ()
#3 0x0000000000000001 in ?? ()
#4 0x0000000000243f90 in ?? ()
#5 0x0000000000243f90 in ?? ()
#6 0x0000000000244000 in ?? ()
#7 0x0000000000244000 in ?? ()
#8 0x000000000249c779 in ?? ()
#9 0x0000000000000000 in ?? ()
Which again seems to be the code loaded by cdboot.efi running.
*** Bug 2183336 has been marked as a duplicate of this bug. *** Created attachment 1962534 [details] catch unannounced high TPL: debug patch series on top of git 94c802e108a0 Bingo. Windows bug. I said earlier that no Windows code should ever be running at TPL_HIGH_LEVEL; per UEFI spec, TPL_HIGH_LEVEL is restricted to platform firmware use. https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html#event-timer-and-task-priority-services > Tasks that execute at a higher priority level may interrupt tasks that > execute at a lower priority level. For example, tasks that run at the > TPL_NOTIFY level may interrupt tasks that run at the TPL_APPLICATION > or TPL_CALLBACK level. While TPL_NOTIFY is the highest level exposed > to the boot services applications, the firmware may have higher task > priority items it deals with. For example, the firmware may have to > deal with tasks of higher priority like timer ticks and internal > devices. Consequently, there is a fourth TPL, TPL_HIGH_LEVEL {link > needed}, designed for use exclusively by the firmware. This morning I got suspicious that "cdboot.efi" was *itself* raising the TPL to HIGH level. That would (i) violate the UEFI spec, and (ii) be consistent with the misbehavior we've been seeing. Gerd has shown before that "cdboot.efi" itself contained STI instructions, and that it also executed those. If you combine that with "cdboot.efi" itself raising the TPL to TPL_HIGH_LEVEL, we've got an explanation (and evidence that "cdboot.efi" violates the UEFI spec, therefore the undefined behavior is "cdboot.efi"'s bug). So, how do we prove that "cdboot.efi" directly raises the TPL to HIGH level? We can't make "cdboot.efi" do things for us, but we can make the rest of the firmware do things that "cdboot.efi" does not do. For example, before raising the TPL to HIGH level, *announce* the intent. And then in the RaiseTPL() boot service implementation (CoreRaiseTpl() [MdeModulePkg/Core/Dxe/Event/Tpl.c]) *enforce* the announcement. If the intent was announced, everything's fine; otherwise, halt. Furthermore, in every case, clear the announcement in RaiseTPL(). Refer to the attached debug patch series (applies on top of upstream git commit 94c802e108a0, "MdePkg/BasePeCoffLib: Deal with broken debug directories", 2023-05-05): 1 04adbfa5d04a Revert "OvmfPkg: Use NestedInterruptTplLib in nested interrupt handlers" 2 1aa2b0eca9a0 Revert "OvmfPkg: Add library to handle TPL from within nested interrupt handlers" 3 b7469ddb3329 LocalApicTimerDxe: warn about timer interrupt firing at TPL_HIGH_LEVEL Here's what these three patches do: the first two revert Michael's re-fixing of the Xen issue (= interrupt storm on Xen causing stack overflow due to nested interrupt handler invocations). The third patch adds a warning to the interrupt handler, namely when we determine that the interrupt was *delivered* with the TPL already being HIGH (this is what should never happen). This sub-series of patches only changes the *symptoms* of the misbehavior. It does not interfere with the misbehavior in any other way; it just makes the bug (wherever it is) more tractable for analysis. The sub-series does not change the invariants that (a) no 3rd party code ever should run at HIGH level, (b) no timer interrupt must be delivered at TPL_HIGH_LEVEL. The sub-series only returns to the naive/direct interrupt handler nesting that we had in place *before* commit 239b50a86370 ("OvmfPkg: End timer interrupt later to avoid stack overflow under load", 2020-06-18). That status was actually correct, it was only problematic when exposed to broken virtual hardware on overloaded Xen hosts. With this sub-series applied, a timer interrupt that is delivered at TPL_HIGH_LEVEL is simply logged. Other than that, we call the RaiseTPL and RestoreTPL boot services directly in TimerInterruptHandler(); we simply don't try to be smart about nesting. 4 1b54bbd0ab94 force agents raising the TPL to HIGH to announce themselves This is the core patch that adds the "infrastructure" for announcing the intent to raise the TPL to HIGH, and it also enforces (and clears) the announcement in RaiseTPL. For normal edk2 standards, this patch should be split into multiple parts (5 patches), but for debugging, it is good enough. Booting OVMF with the series applied up to this point will of course trigger the new ASSERT() in a whole lot of locations -- edk2 justifiedly raises the TPL to HIGH in a good number of places, but never "announces" the intent. Thus, the rest of the series: 5 aa55557a86ef DXE Core: CoreAcquireEventLock(): announce high TPL 6 10f666d7cb57 DXE Core: CoreAcquireDispatcherLock(): announce high TPL 7 5ffc2eec2362 DXE Core: CoreCurrentSystemTime(): announce high TPL 8 42d4708a027e OvmfPkg/LocalApicTimerDxe: TimerInterruptHandler(): announce high TPL 9 4f05f8422a1a MdeModulePkg/MonotonicCounterRuntimeDxe: announce high TPL in GetNextHigh 10 c2dc683e159e MdeModulePkg/MonotonicCounterRuntimeDxe: announce high TPL in GetNext 11 8d17c4ad439a MdeModulePkg/PciBusDxe: announce high TPL in BarExisted 12 56b300b4ae2a MdeModulePkg/PciBusDxe: announce high TPL in PciTestSupportedAttribute 13 15be7cf9fcff NetworkPkg/DpcDxe: announce high TPL in DpcDispatchDpc 14 18c3f576171a NetworkPkg/DpcDxe: announce high TPL in DpcQueueDpc 15 8b09460f0397 MdePkg/UefiLib: announce high TPL in EfiGetCurrentTpl 16 abd2e07fdc32 MdeModulePkg/UsbBusDxe: announce high TPL in UsbGetCurrentTpl instruments all those locations, with announcing the intent. I developed each patch in this sub-series as follows: (1) boot OVMF, passing "-s" to QEMU (= enable the gdb back-end), (2) hit the ASSERT from patch#4 (raising the TPL to HIGH without announcement), (3) run the following command from the edk2 project root: ( cd BaseTools/Scripts/ && gdb -ex "target remote localhost:1234" -ex "source efi_gdb.py" ) (4) get a backtrace, (5) locate the edk2 code raising the TPL, and modify it to announce the intent. I didn't always choose to extend the *immediate* caller of RaiseTPL; sometimes I went for the next outer frame. (See for example in patches #5 through #7.) Again for upstream edk2 standards, this high-level ordering of the series is inapproprite; after this last part of the series was developed, it should have been reordered with patch #4. That is, normally we should only "arm" the enforcement when the codebase already complies with the new requirement. However, this more naive / original ordering shows the development method better, in this case. With the entire series in place, the following happens: - If I boot "en_windows_10_enterprise_ltsc_2019_x64_dvd_be3c8ffb.iso", there is nothing out of the ordinary. The warning message from patch#3 is not logged (that is, no violation of the invariant (b)). The ASSERT() from patch#4 is not hit (that is, no unannounced TPL bump to HIGH level -- no violation of the invariant (a)). And the Windows installer starts OK. - If I boot "en-us_windows_11_business_editions_version_22h2_x64_dvd_17a08ce3.iso" on the other hand, then patch#4 *does catch* "cdboot.efi" at violating invariant (a), that is, raising the TPL to HIGH level. And this happens *before* patch#3 could complain about invariant (b) being violated (i.e., a timer interrupt being delivered at TPL_HIGH_LEVEL); cf. "TimerInterruptHandler:65" in comment#25. Here's the tail of the firmware log (wrapped manually for readability): > Loading driver at 0x00010000000 EntryPoint=0x00010001090 cdboot.efi > InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 3DEB5B18 > ProtectUefiImageCommon - 0x3CEC3040 > - 0x0000000010000000 - 0x000000000011C000 > ASSERT .../edk2/MdeModulePkg/Core/Dxe/Event/Tpl.c(78): > HighTplAnnounce == ((void *) 0) || *HighTplAnnounce Now, in this stage, the ASSERT() is of course spinning in CpuDeadLoop(), so we can pause the VM, attach gdb, and get a backtrace. (See steps (3) and (4) above.) > (gdb) where > #0 CpuPause () at .../edk2/MdePkg/Library/BaseLib/X64/GccInline.c:45 > #1 0x000000003ef37953 in CpuDeadLoop () at > .../edk2/MdePkg/Library/BaseLib/CpuDeadLoop.c:30 > #2 0x000000003ef3b600 in DebugAssert (FileName=0x3ef4cba0 > ".../edk2/MdeModulePkg/Core/Dxe/Event/Tpl.c", LineNumber=78, > Description=0x3ef4cbf8 > "HighTplAnnounce == ((void *) 0) || *HighTplAnnounce") at > .../edk2/OvmfPkg/Library/PlatformDebugLibIoPort/DebugLib.c:230 > #3 0x000000003ef31d27 in CoreRaiseTpl (NewTpl=TPL_HIGH_LEVEL) at > .../edk2/MdeModulePkg/Core/Dxe/Event/Tpl.c:78 > #4 0x0000000010041681 in ?? () > #5 0x000000000000001f in ?? () > #6 0x000000001006f6c2 in ?? () > #7 0x000000003d3d4000 in mConIn () > #8 0x000000003ef138e0 in ?? () > #9 0x0000000010111ffc in ?? () > #10 0x0000000000000001 in ?? () > #11 0x0000000010111fc0 in ?? () > #12 0x000000001003dd20 in ?? () > #13 0x0000000000000000 in ?? () Note the following bits: - Frame#3: RaiseTPL is being called with NewTpl=TPL_HIGH_LEVEL. - Frame#4: the return address on the stack is 0x0000000010041681. This address (range) definitely belongs to "cdboot.efi", per earlier investigation (see the prior BZ comments). OK, let's get a disassembly leading up to return address 0x0000000010041681: > (gdb) x/20i 0x10041660 > 0x10041660: mov (%rax),%ebx > 0x10041662: cmp $0x1,%ebx > 0x10041665: je 0x1004166f > 0x10041667: lea -0x9(%rbp),%ecx > 0x1004166a: call 0x1009b478 > 0x1004166f: mov 0xd091a(%rip),%rax # 0x10111f90 > 0x10041676: mov $0x1f,%ecx > 0x1004167b: mov 0x18(%rax),%rax > 0x1004167f: call *%rax > 0x10041681: nopl (%rax) <------ we're going to return here > 0x10041684: mov %rax,%r15 > 0x10041687: cmp $0x1,%ebx > 0x1004168a: je 0x10041693 > 0x1004168c: mov %ebx,%ecx > 0x1004168e: call 0x1009b478 > 0x10041693: mov $0x1,%ecx > 0x10041698: call 0x100046ac > 0x1004169d: rdtsc > 0x1004169f: shl $0x20,%rdx > 0x100416a3: mov $0x3e8,%ecx Consider the three instructions just before the return address: > 0x10041676: mov $0x1f,%ecx > 0x1004167b: mov 0x18(%rax),%rax > 0x1004167f: call *%rax > 0x10041681: nopl (%rax) <------ we're going to return here I find the GNU assembly syntax hard to read, so the same byte sequence disassembled with NASM: > B91F000000 mov ecx,0x1f > 488B4018 mov rax,[rax+0x18] > FFD0 call rax The first instruction moves 0x1F to ECX. Per EFIAPI (aka Microsoft) calling convention <https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention>, this means that we're setting the first argument for the upcoming function call to 0x1F. And 0x1F happens to be the *published* value of TPL_HIGH_LEVEL <https://uefi.org/specs/UEFI/2.10/07_Services_Boot_Services.html#efi-boot-services-raisetpl>. The second instruction uses RAX as a pointer, offsets it by 0x18 bytes, and reads the qword from the pointed-to location back into RAX. The third instruction calls the function that is now pointed-to by RAX. Now consider the EFI_BOOT_SERVICES structure (quoting just the front): https://uefi.org/specs/UEFI/2.10/04_EFI_System_Table.html#efi-boot-services > typedef struct { > EFI_TABLE_HEADER Hdr; > > // > // Task Priority Services > // > EFI_RAISE_TPL RaiseTPL; // EFI 1.0+ also consider the EFI_TABLE_HEADER structure: https://uefi.org/specs/UEFI/2.10/04_EFI_System_Table.html#id4 > typedef struct { > UINT64 Signature; > UINT32 Revision; > UINT32 HeaderSize; > UINT32 CRC32; > UINT32 Reserved; > } EFI_TABLE_HEADER; The size of EFI_TABLE_HEADER is 24 bytes (decimal) -- in hex: 0x18. Therefore, the offset of the "RaiseTPL" field in EFI_BOOT_SERVICES is 0x18. In other words, in the quoted "cdboot.efi" disassembly, RAX originally points to EFI_BOOT_SERVICES; "cdboot.efi" then loads the "RaiseTPL" field (at offset 0x18) into RAX, then calls RaiseTPL() via RAX, with argument 0x1F (= TPL_HIGH_LEVEL) open-coded in ECX. ((EFI_BOOT_SERVICES *)RAX)->RaiseTPL (TPL_HIGH_LEVEL); This is the Windows bug. Per spec, a 3rd party UEFI application must not do this. Not that we need more evidence, but two further comments: (1) > - Frame#4: the return address on the stack is 0x0000000010041681. This > address (range) definitely belongs to "cdboot.efi", per earlier > investigation (see the prior BZ comments). This is actually evident from the firmware log already, note the load address of "cdboot.efi", plus the subsequent ProtectUefiImageCommon message: > Loading driver at 0x00010000000 EntryPoint=0x00010001090 cdboot.efi > InstallProtocolInterface: [EfiLoadedImageDevicePathProtocol] 3DEB5B18 > ProtectUefiImageCommon - 0x3CEC3040 > - 0x0000000010000000 - 0x000000000011C000 > ASSERT .../edk2/MdeModulePkg/Core/Dxe/Event/Tpl.c(78): > HighTplAnnounce == ((void *) 0) || *HighTplAnnounce ImageBase=0x10000000 ImageSize=0x11C000 Return address 0x10041681 is at offset 0x41681 from ImageBase, and that offset is less than ImageSize. (2) I've just realized that "EFI_BOOT_SERVICES.Hdr.Signature" is right at the start of the EFI_BOOT_SERVICES table; therefore, let's reconsider the context leading up to the invalid RaiseTPL call with *just one more* instruction: > 0x1004166f: mov 0xd091a(%rip),%rax # 0x10111f90 > 0x10041676: mov $0x1f,%ecx > 0x1004167b: mov 0x18(%rax),%rax > 0x1004167f: call *%rax > 0x10041681: nopl (%rax) <------ we're going to return here Here GDB even helpfully calculates (from the RIP-relative addressing) the address of the *global variable* that's being loaded into RAX, going in: 0x10111f90. We expect that this global variable is "gBS"; more precisely, some global variable that "cdboot.efi" uses for caching the address of the boot services table. Let's see what this supposed "gBS" global variable contains: > (gdb) x/g 0x10111f90 > 0x10111f90: 0x000000003ef53200 Then let's see the 8-byte signature at offset#0 from (*gBS) -- equivalently, the contents of gBS->Hdr.Signature. Both as a string, and as a UINT64: > (gdb) x/8c 0x000000003ef53200 > 0x3ef53200 <mBootServices>: 66 'B' 79 'O' 79 'O' 84 'T' 83 'S' 69 'E' 82 'R' 86 'V' > (gdb) x/g 0x000000003ef53200 > 0x3ef53200 <mBootServices>: 0x56524553544f4f42 GDB even resolves the 0x000000003ef53200 address for us. It is the address of the "mBootServices" variable -- the actual boot services table -- from "MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c". And yes, the signature matches; it is "BOOTSERV" / 0x56524553544f4f42: https://uefi.org/specs/UEFI/2.10/04_EFI_System_Table.html#efi-boot-services > #define EFI_BOOT_SERVICES_SIGNATURE 0x56524553544f4f42 So yes, in the quoted assembly, "cdboot.efi" totally calls (&mBootServices)->RaiseTPL (TPL_HIGH_LEVEL); violating the UEFI spec. Workaround: turn on all hyperv enlightments (-cpu host,hv-passthrough=on) (In reply to Gerd Hoffmann from comment #51) > Workaround: turn on all hyperv enlightments (-cpu host,hv-passthrough=on) -cpu host,invtsc=on works too. Candidate patch at https://edk2.groups.io/g/devel/message/104290 (and identical pull request at https://github.com/tianocore/edk2/pull/4362) to work around the known Windows bug. (In reply to Gerd Hoffmann from comment #52) > (In reply to Gerd Hoffmann from comment #51) > > Workaround: turn on all hyperv enlightments (-cpu host,hv-passthrough=on) More specific: hv-frequencies=on enlighment is needed for this. > -cpu host,invtsc=on works too. Nope, doesn't, testing error. (In reply to Michael Brown from comment #53) > Candidate patch at https://edk2.groups.io/g/devel/message/104290 (and > identical pull request at https://github.com/tianocore/edk2/pull/4362) to > work around the known Windows bug. Thank you, Michael! The patch to relax the assertion to an error message has been merged via https://github.com/tianocore/edk2/pull/4371, so Windows 11 installation should now work on current OVMF despite the dubious behaviour of calling (&mBootServices)->RaiseTPL(TPL_HIGH_LEVEL) and then enabling interrupts. I think this issue can probably now be closed, though it would be good to sustain the separate conversation thread with Microsoft to get the root cause fixed. Thank you again for the patches, Michael! Since this is technically a downstream ticket, the expected BZ Status field changes are a bit less simple. Once we've backported your upstream patches (or inherited them via rebase), and a downstream development RPM has been built, the Status will move to MODIFIED. Then the QE team will verify the update, using that (or a later) development RPM (-> ON_QA, -> VERIFIED, ideally). The ticket is expected to be closed once an RPM version containing the verified updates is released (most likely with the next minor release of RHEL9). The process could get much more complicated in some cases (dependent on what previous minor releases of RHEL9 were affected), but this is the gist of it, AFAICT. Add a note from QE side. 1. Not hit this bug when testing with avocado(tp-qemu). The cpu flags "hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,kvm_pv_unhalt=on" will be added to qemu command lines in automation scripts, so I did not hit this bug. cpu command lines: -cpu 'Icelake-Server',ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,kvm_pv_unhalt=on \ 2. Hit this bug when testing with virt-install. The cpu flags were added to qemu command lines were different from avocado automation scripts when testing with virt-install. cpu command lines: -cpu Icelake-Server,ds=on,ss=on,dtes64=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff \ Found the following message in edk2 debug log: ASSERT /builddir/build/BUILD/edk2-f80f052277c8/OvmfPkg/Library/NestedInterruptTplLib/Tpl.c(42): InterruptedTPL < 31 3. Because edk2 and qemu gating test loops were tested by avocado(tp-qemu) scripts, not cover it in these two test loops. vfio-gpu automation test scripts were implemented by virt-install, so cover this bug by vfio-gpu test loop. 4. Tested the above scenarios with the following versions. versions: kernel-5.14.0-306.el9.x86_64 qemu-kvm-8.0.0-2.el9 edk2-ovmf-20230301gitf80f052277c8-3.el9.noarch (In reply to Xueqiang Wei from comment #60) > Add a note from QE side. Right, your first use case contains "hv_frequencies=on", the second one doesn't. This is consistent with Gerd's findings (comment#55). QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. According to Comment 60, run the following tests, all results were passed. So set status to VERIFIED. Versions: kernel-5.14.0-306.el9.x86_64 qemu-kvm-8.0.0-2.el9 edk2-ovmf-20230301gitf80f052277c8-4.el9.noarch 1. tested win11 installation with avocado (tp-qemu), the results were passed. (1/2) Host_RHEL.m9.u3.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.Win11.x86_64.io-github-autotest-qemu.uefi_secureboot.q35: PASS (1161.46 s) (2/2) Host_RHEL.m9.u3.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win11.x86_64.io-github-autotest-qemu.uefi_secureboot.q35: PASS (1897.76 s) 2. win11 installation with virt-install, the results were passed. # cat win11_installation_with_virt-install.sh virt-install \ --name win11_T4 \ --cpu host-model \ --ram=8192 \ --vcpus=4,sockets=1,cores=4,threads=1 \ --disk path=/home/win11_T4.qcow2,format='qcow2',cache=none,size=80,bus='scsi' \ --graphics vnc,listen=0.0.0.0 \ --video bochs \ --boot uefi \ --network bridge=virbr0 \ --noreboot \ --os-variant=win10 \ --cdrom /home/kvm_autotest_root/iso/ISO/Win11/windows_11_x64_official_dvd.iso \ --disk path=/home/kvm_autotest_root/iso/windows/virtio-win-latest-signed-el9.iso,device='cdrom',readonly='on',bus='sata' \ --console file,path=/tmp/win11_T4.log \ --disk path=/home/kvm_autotest_root/images/win11-64/autounattend.iso,device='cdrom',readonly='on',bus='sata' \ # sh win11_installation_with_virt-install.sh Starting install... Creating domain... | 00:00:00 Domain is still running. Installation may be in progress. Waiting for the installation to complete. Domain has shutdown. Continuing. Domain creation completed. # virsh list --all Id Name State --------------------------- - win11_T4 shut off # virsh start win11_T4 Domain 'win11_T4' started # virsh list --all Id Name State -------------------------- 2 win11_T4 running # virsh destroy win11_T4 Domain 'win11_T4' destroyed 3. tested edk2 test loop, the results were passed. Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/edk2_test_with_edk2-20230301gitf80f052277c8-4.el9/results.html 4. tested qemu_gating_test, the results were passed. Job link: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/logs/qemu_gating_test_rhel9_with_edk2-20230301gitf80f052277c8-4.el9/results.html Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6330 |