Bug 1189284

Summary: virt-resize should preserve GPT partition UUIDs, else EFI guests become unbootable
Product: [Community] Virtualization Tools Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: lersek, ptoscano, rbalakri
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1224486 (view as bug list) Environment:
Last Closed: 2015-02-06 10:31:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1224486    
Attachments:
Description Flags
boot output
none
libvirt XML of guest
none
packstack-rhelsa_VARS.fd (gzip compressed) none

Description Richard W.M. Jones 2015-02-04 21:35:07 UTC
Created attachment 988301 [details]
boot output

Description of problem:

Rawhide host (I have patched qemu with UEFI support).
RHELSA 7.1 guest.
libvirt which includes <loader/>, <nvram/> support.

When I try to boot the RHELSA guest, it fails with an
exception in UEFI.  The attached boot.log gives the
complete output.

Version-Release number of selected component (if applicable):

Host:
libvirt-1.2.11-1.fc22.aarch64
AAVMF-20141113-3.git77d5dac.sa1.4.aarch64
kernel 3.19.0-0.rc5.git2.1.fc22.aarch64

Guest:
RHELSA 7.1

How reproducible:

100%

The qemu command line is:

/usr/bin/qemu-system-aarch64 -name packstack-rhelsa -S -machine virt,accel=kvm,usb=off -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/nvram/packstack-rhelsa_VARS.fd,if=pflash,format=raw,unit=1 -m 12288 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid f3930280-b152-4343-9d48-a924ed342fa4 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/packstack-rhelsa.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device virtio-serial-device,id=virtio-serial0 -usb -drive file=/dev/vg_hdd/packstack-rhelsa,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=52:54:00:73:d4:aa -serial pty -msg timestamp=on

Comment 1 Richard W.M. Jones 2015-02-04 21:35:34 UTC
Created attachment 988302 [details]
libvirt XML of guest

Comment 2 Richard W.M. Jones 2015-02-04 21:38:33 UTC
Created attachment 988303 [details]
packstack-rhelsa_VARS.fd (gzip compressed)

It would be really useful if there was a tool for inspecting
these nvram files, eg. to list out the values of variables
inside.  hexdump is not very helpful.

Comment 3 Richard W.M. Jones 2015-02-05 07:40:49 UTC
By a process of elimination I found out that this is actually
caused by resizing the image (ie. virt-resize) while building.

I've no idea why virt-resize would affect the EFI partition (it
doesn't resize it), nor why EFI would then crash.

In any case, reassigning the bug to libguestfs.

Comment 4 Richard W.M. Jones 2015-02-05 07:48:34 UTC
Partition layout before resizing (this guest boots OK):

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          198655   96.0 MiB    EF00  EFI System Partition
   2          198656         1222655   500.0 MiB   0700  
   3         1222656         2482175   615.0 MiB   8200  
   4         2482176        12578815   4.8 GiB     0700  

Partition layout after resizing (this guest fails to boot):

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          198655   96.0 MiB    EF00  EFI
   2          198656         1222655   500.0 MiB   0700  primary
   3         1222656         2482175   615.0 MiB   8200  primary
   4         2482176        20968831   8.8 GiB     0700  primary

The first (EFI) partition has a different name, but the
same boundaries etc.

Comment 5 Richard W.M. Jones 2015-02-05 08:00:05 UTC
Using gdisk I can get the detailed partition info for the partition:

Before:
Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System)
Partition unique GUID: 6A3C69C8-FD2E-41B7-A2DC-0FFDB0B1FA34
First sector: 2048 (at 1024.0 KiB)
Last sector: 198655 (at 97.0 MiB)
Partition size: 196608 sectors (96.0 MiB)
Attribute flags: 0000000000000000
Partition name: 'EFI System Partition'

After:
Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System)
Partition unique GUID: A8ADB729-77FC-4D2B-B140-194E2BA7EE23
First sector: 2048 (at 1024.0 KiB)
Last sector: 198655 (at 97.0 MiB)
Partition size: 196608 sectors (96.0 MiB)
Attribute flags: 0000000000000000
Partition name: 'EFI'

The fields which changed are: Partition unique GUID and Partition name.

First I changed the Partition name, but that did not help.

Second I changed the Partition unique GUID (you have to use the
'x' expert menu in gdisk).  This fixes the problem.

So I conclude that virt-resize needs to preserve the partition
GUID.

Comment 6 Richard W.M. Jones 2015-02-05 08:41:31 UTC
Patch series posted upstream:
https://www.redhat.com/archives/libguestfs/2015-February/msg00032.html

Comment 8 Laszlo Ersek 2015-02-05 10:07:35 UTC
(In reply to Richard W.M. Jones from comment #5)

> So I conclude that virt-resize needs to preserve the partition
> GUID.

Oh yes, definitely. See the following passage from the UEFI spec:


> 3.1.2 Load Option Processing
>
> [...]
>
> The boot manager must also support booting from a short-form device path
> that starts with the first element being a hard drive media device path
> (see Table 77). The boot manager must use the GUID or signature and
> partition number in the hard drive device path to match it to a device in
> the system. If the drive supports the GPT partitioning scheme the GUID in
> the hard drive media device path is compared with the UniquePartitionGuid
> field of the GUID Partition Entry (see Table 18). If the drive supports
> the PC-AT MBR scheme the signature in the hard drive media device path is
> compared with the UniqueMBRSignature in the Legacy Master Boot Record (see
> Table 13). If a signature match is made, then the partition number must
> also be matched. [...]

The above means that a certain kind of "relative" device paths are supported
for hard disk boot options. The device path fragment that leads from the
root  to the hard disk (which could include device path nodes like PCI root
bridge, PCI controller, or virtio-mmio nodes) can be omitted. The idea
being, if you re-plug the same disk to a different hardware controller (PCI
slot, PCI bridge, different virtio-mmio register block etc), then your
relative boot option will continue to match, because the matching won't try
to enforce the initial portion of the boot option device path. Instead, it
will enumerate all hard disks in the system (with their respective GPT
GUIDs), and search those for the GPT GUID stored in the first node of your
relative (=shorthand) boot option device path.

If you change the disks GPT GUID, then your existent boot option is by
definition unable to match it.

The boot log attached to comment #0 says:

> SetBootOrderFromQemu: FwCfg:
> /virtio-mmio@000000000a003c00/disk@0,0
> HALT
> SetBootOrderFromQemu: FwCfg: <end>

This is the OpenFirmware boot order AAVMF downloaded from QEMU over fw_cfg.
Then,

> ParseOfwNode: DriverName="virtio-mmio" UnitAddress="000000000a003c00" DeviceArguments=""
> ParseOfwNode: DriverName="disk" UnitAddress="0,0" DeviceArguments=""
> TranslateOfwPath: success: "VenHw(837DCA9E-E874-4D82-B29A-23FE0E23D1E2,003C000A00000000)/HD("

This explains what UEFI device path fragment AAVMF transforms the
OpenFirmware device path to. This is an absolute devpath fragment. AAVMF
will filter  the UEFI boot options with this leading fragment.

In the next step AAVMF would go through all of the UEFI boot options,
*expand* all the relative ones to absolute ones (using the algorithm
described above), and filter against the expanded values. However, since
your GPT GUID has changed, the expansion doesn't yield any results, hence
that boot option is dropped. And, there are no other matches either.

If there are no matches at all, then AAVMF doesn't touch the preexistent
BootOrder / Boot#### variables at all. So, it certainly doesn't *cause* the
DxeCore to crash. Why the DxeCore crashes under such circumstances is in
fact a mistery for me, but we probably shouldn't spend time trying to
identify it. (Given that preserving the GPT GUID is the right thing to do
anyway.)

Thanks!

Comment 10 Laszlo Ersek 2015-05-23 11:18:33 UTC
Hi Rich,

I completely forgot about this BZ (and the fact that I CC'd myself on it), but last night I ran into the same issue, and now I found the BZ again with Google.

I tried to resize the C: partition (/dev/sda4) of a Windows Server 2012 R2 guest (x86_64, OVMF) on my RHEL-7.1 laptop. The UEFI boot option was lost (in retrospect surely because of the GUID change), but interestingly, the GUID seems to be hardcoded in other parts of Windows as well, because it rejected to boot even after I recreated the UEFI boot option manually, in OVMF. At that point the UEFI boot loader started, but later it encountered an error. A repair attempt with the installer ISO failed too.

So, my question here -- any chance this fix could be ported to RHEL-7.2? Thanks.