Bug 1189284
Summary: | virt-resize should preserve GPT partition UUIDs, else EFI guests become unbootable | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Richard W.M. Jones <rjones> | ||||||||
Component: | libguestfs | Assignee: | Richard W.M. Jones <rjones> | ||||||||
Status: | CLOSED UPSTREAM | QA Contact: | |||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | unspecified | CC: | lersek, ptoscano, rbalakri | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1224486 (view as bug list) | Environment: | |||||||||
Last Closed: | 2015-02-06 10:31:28 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1224486 | ||||||||||
Attachments: |
|
Created attachment 988302 [details]
libvirt XML of guest
Created attachment 988303 [details]
packstack-rhelsa_VARS.fd (gzip compressed)
It would be really useful if there was a tool for inspecting
these nvram files, eg. to list out the values of variables
inside. hexdump is not very helpful.
By a process of elimination I found out that this is actually caused by resizing the image (ie. virt-resize) while building. I've no idea why virt-resize would affect the EFI partition (it doesn't resize it), nor why EFI would then crash. In any case, reassigning the bug to libguestfs. Partition layout before resizing (this guest boots OK): Number Start (sector) End (sector) Size Code Name 1 2048 198655 96.0 MiB EF00 EFI System Partition 2 198656 1222655 500.0 MiB 0700 3 1222656 2482175 615.0 MiB 8200 4 2482176 12578815 4.8 GiB 0700 Partition layout after resizing (this guest fails to boot): Number Start (sector) End (sector) Size Code Name 1 2048 198655 96.0 MiB EF00 EFI 2 198656 1222655 500.0 MiB 0700 primary 3 1222656 2482175 615.0 MiB 8200 primary 4 2482176 20968831 8.8 GiB 0700 primary The first (EFI) partition has a different name, but the same boundaries etc. Using gdisk I can get the detailed partition info for the partition: Before: Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System) Partition unique GUID: 6A3C69C8-FD2E-41B7-A2DC-0FFDB0B1FA34 First sector: 2048 (at 1024.0 KiB) Last sector: 198655 (at 97.0 MiB) Partition size: 196608 sectors (96.0 MiB) Attribute flags: 0000000000000000 Partition name: 'EFI System Partition' After: Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System) Partition unique GUID: A8ADB729-77FC-4D2B-B140-194E2BA7EE23 First sector: 2048 (at 1024.0 KiB) Last sector: 198655 (at 97.0 MiB) Partition size: 196608 sectors (96.0 MiB) Attribute flags: 0000000000000000 Partition name: 'EFI' The fields which changed are: Partition unique GUID and Partition name. First I changed the Partition name, but that did not help. Second I changed the Partition unique GUID (you have to use the 'x' expert menu in gdisk). This fixes the problem. So I conclude that virt-resize needs to preserve the partition GUID. Patch series posted upstream: https://www.redhat.com/archives/libguestfs/2015-February/msg00032.html (In reply to Richard W.M. Jones from comment #5) > So I conclude that virt-resize needs to preserve the partition > GUID. Oh yes, definitely. See the following passage from the UEFI spec: > 3.1.2 Load Option Processing > > [...] > > The boot manager must also support booting from a short-form device path > that starts with the first element being a hard drive media device path > (see Table 77). The boot manager must use the GUID or signature and > partition number in the hard drive device path to match it to a device in > the system. If the drive supports the GPT partitioning scheme the GUID in > the hard drive media device path is compared with the UniquePartitionGuid > field of the GUID Partition Entry (see Table 18). If the drive supports > the PC-AT MBR scheme the signature in the hard drive media device path is > compared with the UniqueMBRSignature in the Legacy Master Boot Record (see > Table 13). If a signature match is made, then the partition number must > also be matched. [...] The above means that a certain kind of "relative" device paths are supported for hard disk boot options. The device path fragment that leads from the root to the hard disk (which could include device path nodes like PCI root bridge, PCI controller, or virtio-mmio nodes) can be omitted. The idea being, if you re-plug the same disk to a different hardware controller (PCI slot, PCI bridge, different virtio-mmio register block etc), then your relative boot option will continue to match, because the matching won't try to enforce the initial portion of the boot option device path. Instead, it will enumerate all hard disks in the system (with their respective GPT GUIDs), and search those for the GPT GUID stored in the first node of your relative (=shorthand) boot option device path. If you change the disks GPT GUID, then your existent boot option is by definition unable to match it. The boot log attached to comment #0 says: > SetBootOrderFromQemu: FwCfg: > /virtio-mmio@000000000a003c00/disk@0,0 > HALT > SetBootOrderFromQemu: FwCfg: <end> This is the OpenFirmware boot order AAVMF downloaded from QEMU over fw_cfg. Then, > ParseOfwNode: DriverName="virtio-mmio" UnitAddress="000000000a003c00" DeviceArguments="" > ParseOfwNode: DriverName="disk" UnitAddress="0,0" DeviceArguments="" > TranslateOfwPath: success: "VenHw(837DCA9E-E874-4D82-B29A-23FE0E23D1E2,003C000A00000000)/HD(" This explains what UEFI device path fragment AAVMF transforms the OpenFirmware device path to. This is an absolute devpath fragment. AAVMF will filter the UEFI boot options with this leading fragment. In the next step AAVMF would go through all of the UEFI boot options, *expand* all the relative ones to absolute ones (using the algorithm described above), and filter against the expanded values. However, since your GPT GUID has changed, the expansion doesn't yield any results, hence that boot option is dropped. And, there are no other matches either. If there are no matches at all, then AAVMF doesn't touch the preexistent BootOrder / Boot#### variables at all. So, it certainly doesn't *cause* the DxeCore to crash. Why the DxeCore crashes under such circumstances is in fact a mistery for me, but we probably shouldn't spend time trying to identify it. (Given that preserving the GPT GUID is the right thing to do anyway.) Thanks! Upstream commits: https://github.com/libguestfs/libguestfs/commit/b3e3750b13a96fb6d1b79f7c0dabb4eeb37de5ef https://github.com/libguestfs/libguestfs/commit/40c133b2c81666f6dde43704e66bf59206d5c111 https://github.com/libguestfs/libguestfs/commit/f630677c14c7d5528e1ab39e4f805e0957b2ee3e Fix included in: libguestfs-1.29.24-3.fc22 libguestfs-1.28.1-1.23.aa7a Hi Rich, I completely forgot about this BZ (and the fact that I CC'd myself on it), but last night I ran into the same issue, and now I found the BZ again with Google. I tried to resize the C: partition (/dev/sda4) of a Windows Server 2012 R2 guest (x86_64, OVMF) on my RHEL-7.1 laptop. The UEFI boot option was lost (in retrospect surely because of the GUID change), but interestingly, the GUID seems to be hardcoded in other parts of Windows as well, because it rejected to boot even after I recreated the UEFI boot option manually, in OVMF. At that point the UEFI boot loader started, but later it encountered an error. A repair attempt with the installer ISO failed too. So, my question here -- any chance this fix could be ported to RHEL-7.2? Thanks. |
Created attachment 988301 [details] boot output Description of problem: Rawhide host (I have patched qemu with UEFI support). RHELSA 7.1 guest. libvirt which includes <loader/>, <nvram/> support. When I try to boot the RHELSA guest, it fails with an exception in UEFI. The attached boot.log gives the complete output. Version-Release number of selected component (if applicable): Host: libvirt-1.2.11-1.fc22.aarch64 AAVMF-20141113-3.git77d5dac.sa1.4.aarch64 kernel 3.19.0-0.rc5.git2.1.fc22.aarch64 Guest: RHELSA 7.1 How reproducible: 100% The qemu command line is: /usr/bin/qemu-system-aarch64 -name packstack-rhelsa -S -machine virt,accel=kvm,usb=off -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/nvram/packstack-rhelsa_VARS.fd,if=pflash,format=raw,unit=1 -m 12288 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid f3930280-b152-4343-9d48-a924ed342fa4 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/packstack-rhelsa.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device virtio-serial-device,id=virtio-serial0 -usb -drive file=/dev/vg_hdd/packstack-rhelsa,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=52:54:00:73:d4:aa -serial pty -msg timestamp=on