Bug 2149629
| Summary: | Win 2022 fails to boot after virt-p2v conversion from physical host with nvme disk | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | tingting zheng <tzheng> | ||||||
| Component: | virt-v2v | Assignee: | Laszlo Ersek <lersek> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | tingting zheng <tzheng> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 9.2 | CC: | chhu, hongzliu, juzhou, lersek, mxie, rjones, tyan, vwu, xiaodwan | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | virt-v2v-2.2.0-5.el9 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2023-05-09 07:45:47 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
tingting zheng
2022-11-30 12:57:23 UTC
Created attachment 1928737 [details]
virt-v2v-conversion-log.txt
Looking at the conversion log the guest has one disk size 960197124096 (around 960GB). The guest is Windows Server 2022. Conversion and copying seem to go well. Why on earth is a grub shell being shown for a Windows guest...? We copy in Windows 2016 drivers from virtio-win. Newer drivers are available, so that is a bug itself, but I'm not sure if it can cause this problem. I too think this must be a mixup of some sort; the original (physical) Windows machine surely doesn't have grub installed, and virt-v2v never installs grub. IOW grub cannot "accidentally" appear in the conversion output. If grub is shown in the screenshot, then I must think the screenshot doesn't belong to the conversion output. The only logic I can find in virt-v2v that's particular to grub.efi is the "Fixing UEFI bootloader" branch in "convert/convert_linux.ml". But that requires an existing grub.efi in the EFI system partition, plus it is only active for (some) Linux guests, never windows. Can we look at the output disk image? Thanks. OK I think I know what's going on. Difficult problem.
This is the directory tree on the EFI system partition (excerpt):
EFI
EFI/BOOT
EFI/BOOT/BOOTX64.EFI
EFI/BOOT/fbx64.efi
EFI/Dell
EFI/Dell/BootOptionCache
EFI/Dell/BootOptionCache/BootOptionCache.dat
EFI/Microsoft
EFI/Microsoft/Boot
EFI/Microsoft/Boot/...
EFI/Microsoft/Boot/bootmgfw.efi
EFI/Microsoft/Boot/bootmgr.efi
EFI/Microsoft/Recovery
EFI/Microsoft/Recovery/...
EFI/redhat
EFI/redhat/BOOTX64.CSV
EFI/redhat/grub.cfg
EFI/redhat/grub.cfg.rpmsave
EFI/redhat/grubx64.efi
EFI/redhat/mmx64.efi
EFI/redhat/shim.efi
EFI/redhat/shimx64-redhat.efi
EFI/redhat/shimx64.efi
and one more relevant detail is:
-rwxr-xr-x 1 root root 946736 Jun 7 17:22 /sysroot/EFI/BOOT/BOOTX64.EFI
-rwxr-xr-x 1 root root 946736 Jun 7 17:22 /sysroot/EFI/redhat/shim.efi
Note: same size, same timestamp. (I didn't bother comparing checksums or
contents.)
So here's what I suspect happened:
- The original physical machine had RHEL installed at some point.
- When the installation of RHEL completed, the /EFI/redhat directory was
populated with shim+grub. Additionally, shim was installed as
/EFI/BOOT/BOOTX64.EFI as well, for implementing the "fallback boot
path". This fallback path is taken by the UEFI boot manager of the
platform firmware when the UEFI boot options -- stored in non-volatile
UEFI global variables -- are lost, for any reason. On the fallback
boot path, the boot loader found at /EFI/BOOT/BOOTX64.EFI is supposed
to recreate (restore) the UEFI boot options such that the installed OS
become bootable again. The *data* for this kind of UEFI boot option
restoration is kept in "/EFI/redhat/BOOTX64.CSV", when the installed
OS is GNU/Linux, and the boot loaders are shim+grub.
- There is a directory called "/EFI/Dell/BootOptionCache", which is good
for ... I don't know what, but I certainly don't welcome it. It looks
like it could interfere with boot option processing somehow, and it
seems to be some quirk of the platform vendor (i.e., Dell, because the
original physical machine is from Dell). Either way, I think I can
explain the symptom while disregarding the "/EFI/Dell" subdir totally.
- Later, Windows got installed on the machine. This removed the RHEL OS
install (rootfs and so), but it did not remove "/EFI/redhat/".
Furthermore, the windows installer neither overwrote, nor removed, the
fallback path boot loader "/EFI/BOOT/BOOTX64.EFI" (= shim) which had
originally been put in place by Anaconda, when installing RHEL.
- This caused no immediate problems as the Windows installer did update
the UEFI boot options at once. In detail this means the creation of a
new Boot#### (such as Boot001C) non-volatile UEFI variable, pointing
at the windows boot loader "/EFI/Microsoft/Boot/bootmgfw.efi", and
then updating the BootOrder non-volatile UEFI variable such that the
just-created Boot#### option be processed first, taking effect with
the first subsequent boot.
- However, the installation of Windows like this created a "time bomb".
As soon as the UEFI Boot options were lost for any reason, the
fallback boot path in the platform firmware would kick in. That'd be
fine per se, *except* the fallback boot path on this machine would now
be in a confused state, with the fallback boot loader
"/EFI/BOOT/BOOTX64.EFI" still belonging to the *old* operating system,
and said OS having been erased.
- The conversion from physical to virtual carried over the disk
contents, but not the UEFI variables, thereby triggering the broken
fallback boot path.
I wonder why the Windows installer did not put its own (Windows)
fallback bootloader in place.
(When converting UEFI OSes with virt-v2v, we *never* copy UEFI
variables, including the UEFI boot options; therefore we *always* rely
on the fallback boot path at the first start of the converted VM --
regardless of whether the subject OS is Windows or Linux. The fallback
boot path Just Works (TM) with both OSes.)
There is one practical and one theoretical way to mitigate this.
In the individual (practical) case, the administrator of the converted
VM can interrupt the OVMF boot process, get into the OVMF setup TUI, and
use the following facilities:
Boot Maintenance Manager
Boot Options
Add Boot Option
locate "/EFI/Microsoft/Boot/bootmgfw.efi"
add a boot option description
Commit Changes and Exit
Change Boot Order
hoist the new boot option to the top, using the "+" key
press F10 for "Save"
Reset
In the general (theoretical) case, we'd call "efibootmgr" from virt-p2v,
twice. From the first invocation, we'd figure out what would be the
"first" UEFI boot option for the physical machine, hadn't we just booted
the box off of the virt-p2v ISO. From the second "efibootmgr"
invocation, we'd dump the details of said "first" UEFI boot option. Then
we'd pass this UEFI boot option information to virt-v2v, probably on the
virt-v2v command line.
In turn, virt-v2v would have to make these UEFI boot option settings
(one Boot#### variable, and then the BootOrder variable) "stick" for the
conversion output VM. That's not really doable from within the
libguestfs appliance at this point (e.g. with a new libguestfs API
calling "efibootmgr" internally), because for that, the appliance itself
would have to be UEFI-booted. Furthermore, the set of the conversion
output artifacts should now include the "UEFI varstore file" as well
(for each Output Module that virt-v2v provides).
An alternative to setting the UEFI variables from within the appliance
would be to call Gerd's new "virt-fw-vars" tool, from virt-v2v, which
can edit an OVMF varstore file from the outside (i.e., from the host
side). The set of the conversion output artifacts should still include
the "UEFI varstore file" (for each Output Module).
I'm sure I read somewhere a bug about old fallback files being "left around" from old installs, but I don't know where that was. Anyhow - could we detect and remove EFI/BOOT/BOOTX64.EFI after conversion if it "looks wrong" (for some value of "looks wrong": suggest: if Windows guest, remove it?) Yet another option might be to "fix up" the UEFI fallback boot path on the EFI system partition, in virt-v2v. We already do that for *some* linux guests; see commit 59f0c2795263 ("v2v: fix UEFI bootloader for linux guests", 2020-07-24).
We could try something like that for Windows as well, but it looks complicated and brittle. For starters, in the present case, we *do* have a fallback boot loader, it just does not match the Windows boot loader. Furthermore, the Linux logic relies on the firstboot machinery as well -- it registers a firstboot script for *undoing* the fallback fixup on the ESP, and for modifying the UEFI variables ultimately with efibootmgr. None of those look great for a Windows guest.
(In reply to Richard W.M. Jones from comment #7) > I'm sure I read somewhere a bug about old fallback files being > "left around" from old installs, but I don't know where that was. > > Anyhow - could we detect and remove EFI/BOOT/BOOTX64.EFI after > conversion if it "looks wrong" (for some value of "looks wrong": > suggest: if Windows guest, remove it?) Removing is insufficient; if we remove /EFI/BOOT/BOOTX64.EFI, then Windows will still not boot. The only difference that removal makes is that we don't land at the (useless / broken) GRUB shell, but at the OVMF setup TUI. (When there is nothing to boot for the UEFI boot manager that's built into OVMF, the user is dropped to the Setup TUI.) Now, from looking briefly at a correctly installed Windows VM, it *seems* that "/EFI/BOOT/BOOTX64.EFI" is just a copy of "/EFI/Microsoft/Boot/bootmgfw.efi" (I've compared the contents). So, in case we're converting a UEFI Windows guest, we could check: "/EFI/Microsoft/Boot/bootmgfw.efi" exists && ("/EFI/BOOT/BOOTX64.EFI" does not exist || "/EFI/BOOT/BOOTX64.EFI" differs from "/EFI/Microsoft/Boot/bootmgfw.efi") and if this condition holds, then: rm -rf /EFI/BOOT mkdir /EFI/BOOT cp "/EFI/Microsoft/Boot/bootmgfw.efi" "/EFI/BOOT/BOOTX64.EFI" In fact I'm going to try this right now on Tingting's converted disk image, using guestfish, and then ask Tingting to boot the guest again. BTW we already have some Windows ESP manipulation from commit c721d5bb2447 ("v2v: remove the 'graphicsmodedisabled' entry in ESP BCD", 2016-06-12).
That's quite helpful -- the ESPs are already collected in the parent commit: fb9f78c32e40 ("v2v: fill the list of the EFI system partitions", 2016-06-12).
(In reply to Laszlo Ersek from comment #10) > In fact I'm going to try this right now on Tingting's converted disk image, > using guestfish, and then ask Tingting to boot the guest again. Yes, this works. I needed to fix up the ESP as outlined above, *and* I needed to delete the UEFI varstore (so libvirtd would reinitialize it from the logically empty varstore template). The reason for the latter was that the broken fallback logic had already run at least once, and polluted the varstore file with a broken UEFI boot option (pointing at grub). This latter step should not be necessary in virt-v2v, as during conversion, no UEFI varstore exists at all, and we're before the very first boot of the converted VM. ... by "this works", I meant to say that after those changes, the fallback boot path in OVMF took effect, and the Windows VM ("dell-per750-29") booted fine.
I'll try to cook up a patch. Tingting: please keep the physical machine intact, for testing the change with virt-p2v (only virt-v2v will need to be updated, on the conversion server). Also, I think we'll need good regression testing of UEFI Windows guest conversions, as I have no evidence that /EFI/BOOT/BOOTX64.EFI must *always* be identical to /EFI/Microsoft/Boot/bootmgfw.efi, on all possible Windows releases. This change has the capacity to break fallback paths. The regression test is thankfully simple. Because we *always* rely on the fallback path when booting a converted UEFI guest for the first time, any breakage should appear immediately. (In reply to Laszlo Ersek from comment #15) > I'll try to cook up a patch. > > Tingting: please keep the physical machine intact, for testing the change > with virt-p2v (only virt-v2v will need to be updated, on the conversion > server). Ok, I will leave the physical machine with Win 2012 and will test with new build. > > Also, I think we'll need good regression testing of UEFI Windows guest > conversions, as I have no evidence that /EFI/BOOT/BOOTX64.EFI must *always* > be identical to /EFI/Microsoft/Boot/bootmgfw.efi, on all possible Windows > releases. This change has the capacity to break fallback paths. > > The regression test is thankfully simple. Because we *always* rely on the > fallback path when booting a converted UEFI guest for the first time, any > breakage should appear immediately. QE will run regression testing based on the above comments, thanks. [v2v PATCH 0/2] convert_windows: fix up the UEFI fallback boot loader if broken Message-Id: <20221202124409.11741-1-lersek> https://listman.redhat.com/archives/libguestfs/2022-December/030373.html (In reply to Laszlo Ersek from comment #17) > [v2v PATCH 0/2] convert_windows: fix up the UEFI fallback boot loader if broken > Message-Id: <20221202124409.11741-1-lersek> > https://listman.redhat.com/archives/libguestfs/2022-December/030373.html Upstream commit range 7b49177e2b0c..9d4b58dcecc4. Tested with: livecd-p2v-202211091434.iso libguestfs-1.48.4-4.el9.x86_64 virt-v2v-2.0.7-7.el9.x86_64 libvirt-8.10.0-1.el9.x86_64 qemu-kvm-7.1.0-6.el9.x86_64 Use virt-p2v to convert the same physical host with Win 2022 os installed,the conversion can be successfully. After conversion, Win 2022 can be booted successfully, so add "Tested" in Verified field. Tested with: livecd-p2v-202211091434.iso libguestfs-1.48.4-4.el9.x86_64 virt-v2v-2.0.7-7.el9.x86_64 libvirt-8.10.0-2.el9.x86_64 qemu-kvm-7.2.0-2.el9.x86_64 virtio-win-1.9.32-0.el9_1.noarch 1.Use virt-p2v to convert the same physical host with Win 2022 os installed,the conversion can be successfully and guest can be booted successfully. 2.Use virt-v2v to convert UEFI Win11/Win2016/Win2019/Win2022 guest, all can be converted and booted successfully. Refer to the above comments, move the bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt-v2v bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2313 |