2131254 – Remove the necessity to pad the edk2 images to 64MB

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2131254 - Remove the necessity to pad the edk2 images to 64MB

Summary: Remove the necessity to pad the edk2 images to 64MB

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	9.1
Hardware:	aarch64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Gerd Hoffmann
QA Contact:	Yihuang Yu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2019723 (view as bug list)
Depends On:
Blocks:	1924294
TreeView+	depends on / blocked

Reported:	2022-09-30 13:14 UTC by Eric Auger
Modified:	2023-07-20 10:26 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-07-20 10:26:45 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-135539	0	None	None	None	2022-10-03 12:21:24 UTC

Description Eric Auger 2022-09-30 13:14:04 UTC

On aarch64 the edk2 code flash and vars flash are padded to 64MB. This is not necessary on other archs.

The FW expects the code being mapped at offset 0 and vars being mapping at offset 64MB. As QEMU currently packs one flash after the other so, the padding makes sure both flashes are at the right offset.

We would like to remove the padding and teach qemu to map the vars flash at 64MB offset even if the code flash is smaller.

Comment 2 Gerd Hoffmann 2022-12-15 09:24:43 UTC

The actual code change is easier than I expected.
https://git.kraxel.org/cgit/qemu/log/?h=sirius/arm-flash-padding

I the more interesting question is probably how we'll go wire
that up to management.  libvirt needs to know whenever qemu
can deal with unpadded firmware images or not ...

Comment 3 Gerd Hoffmann 2022-12-16 11:21:12 UTC

Uhm, looks like we had attempts in the past (must have missed those)
and it is apparently a bit more tricky,  Peter's comment in the patch:

https://patchwork.ozlabs.org/project/qemu-devel/patch/20221216101234.2202009-2-kraxel@redhat.com/#3021779

Comment 4 Gerd Hoffmann 2022-12-16 11:22:47 UTC

One of them: https://www.mail-archive.com/qemu-devel@nongnu.org/msg607291.html

Comment 5 Gerd Hoffmann 2022-12-16 11:53:34 UTC

There is a patch (https://www.mail-archive.com/qemu-devel@nongnu.org/msg610866.html).

Different approach: Don't read sparse parts from files into the buffer.
Unused pages are newer written then and will be backed by the (shared)
zero page, which avoids wasting memory.

Patch apparently never made it upstream.  Kevin?

Comment 6 Kevin Wolf 2022-12-16 17:52:57 UTC

(In reply to Gerd Hoffmann from comment #5)
> Patch apparently never made it upstream.  Kevin?

Yes, looks like a non-RFC version was never sent upstream.

I think it's different from your goal if I understand it correctly: You want to make the image file use less than the flash size on disk, whereas the goal of that other patch was to read a 64 MB image file, but not actually allocate the memory at runtime if a page contains only zeros.

For your case, if sparse raw images aren't good enough (e.g. because you actually want a small file size, not just a small number of used blocks), I wonder why you don't simply use qcow2 images which always only grow as big as they need to?

Comment 7 Gerd Hoffmann 2022-12-19 09:39:45 UTC

(In reply to Kevin Wolf from comment #6)
> (In reply to Gerd Hoffmann from comment #5)
> > Patch apparently never made it upstream.  Kevin?
> 
> Yes, looks like a non-RFC version was never sent upstream.
> 
> I think it's different from your goal if I understand it correctly: You want
> to make the image file use less than the flash size on disk, whereas the
> goal of that other patch was to read a 64 MB image file, but not actually
> allocate the memory at runtime if a page contains only zeros.

It's more about the runtime memory usage.  Mapping 128M per vm with
only ~4 MB being actually used is quit alot ...

Saving disk storage is less important, although getting both would
be nice given we have a per-VM copy of the VARS file.

> For your case, if sparse raw images aren't good enough (e.g. because you
> actually want a small file size, not just a small number of used blocks), I
> wonder why you don't simply use qcow2 images which always only grow as big
> as they need to?

Hmm.  The build process actually creates sparse files.  But looking at
the installed files they are not sparse any more.  Looks like the rpm
packaging looses this.  Google found bug 128335.

If I understand this correctly the patch linked in comment 5 requires
files being sparse (raw) or unallocated (qcow2).

So using the commen 5 patch plus qcow2 files looks like a way which
should work.  Andrea?  Is libvirt fine with that (i.e. firmware json
files pointing to a QEM_EFI.qcow2 file for flash)?

Comment 8 Eric Auger 2022-12-19 21:12:49 UTC

A naive thought: isn't it possible to add an option allowing to define & set a padding size different from 64MB. This would solve the issue of migration. It would be still requested that the mgt layer properly sets this padding size to a reasonable value. By default 64MB would be kept.

Comment 9 Gerd Hoffmann 2022-12-20 07:59:50 UTC

(In reply to Eric Auger from comment #8)
> A naive thought: isn't it possible to add an option allowing to define & set
> a padding size different from 64MB. This would solve the issue of migration.
> It would be still requested that the mgt layer properly sets this padding
> size to a reasonable value. By default 64MB would be kept.

There are many possible options, but given how often that was apparently
discussed already @ virt-devel I don't feel like starting a discussion
again.  Instead go with the result of the previous discussion which only
never was properly submitted for some reason.

(In reply to Gerd Hoffmann from comment #7)
> So using the commen 5 patch plus qcow2 files looks like a way which
> should work.  Andrea?  Is libvirt fine with that (i.e. firmware json
> files pointing to a QEM_EFI.qcow2 file for flash)?

Alternative idea: use ´fallocate -d $file.raw´ in edk2-aarch64 %post
to make the files sparse again.

Comment 10 Gerd Hoffmann 2022-12-20 08:01:05 UTC

> discussed already @ virt-devel

Oops, qemu-devel of course.

Comment 11 Gerd Hoffmann 2022-12-20 08:44:10 UTC

https://patchwork.ozlabs.org/project/qemu-devel/patch/20221220084246.1984871-1-kraxel@redhat.com/

Comment 12 Gerd Hoffmann 2023-01-03 08:48:48 UTC

> (In reply to Gerd Hoffmann from comment #7)
> > So using the commen 5 patch plus qcow2 files looks like a way which
> > should work.  Andrea?  Is libvirt fine with that (i.e. firmware json
> > files pointing to a QEM_EFI.qcow2 file for flash)?
> 
> Alternative idea: use ´fallocate -d $file.raw´ in edk2-aarch64 %post
> to make the files sparse again.

Hmm, seems that doesn't fly ...
https://bugzilla.redhat.com/show_bug.cgi?id=2155673

Comment 13 Andrea Bolognani 2023-01-05 15:31:12 UTC

(In reply to Gerd Hoffmann from comment #7)
> If I understand this correctly the patch linked in comment 5 requires
> files being sparse (raw) or unallocated (qcow2).
> 
> So using the commen 5 patch plus qcow2 files looks like a way which
> should work.  Andrea?  Is libvirt fine with that (i.e. firmware json
> files pointing to a QEM_EFI.qcow2 file for flash)?

Back from PTO and catching up :)

I've gone through the various threads that were linked. Do I understand
correctly that the current plan is to get QEMU to only allocate
enough host memory to store the parts of the CODE/VARS files that are
actually used, and skip the padding? While still using the same
guest-side offsets?

I assume writes to NVRAM will be handled correctly by allocating more
memory on the fly, right? Are there any disadvantage to this
approach, compared to a hypothetical one that changes the offsets?

As for libvirt support, does the way in which the files are passed to
QEMU need to change when they're in qcow2 format instead of raw
format? And would the guest OS see any difference in behavior? If
not, things will probably work transparently.

Anyway, if you give me instructions on how to create qcow2 files from
the existing ones I can give that a try and report back :)

Comment 14 Gerd Hoffmann 2023-01-06 09:51:50 UTC

(In reply to Andrea Bolognani from comment #13)
> (In reply to Gerd Hoffmann from comment #7)
> > If I understand this correctly the patch linked in comment 5 requires
> > files being sparse (raw) or unallocated (qcow2).
> > 
> > So using the commen 5 patch plus qcow2 files looks like a way which
> > should work.  Andrea?  Is libvirt fine with that (i.e. firmware json
> > files pointing to a QEM_EFI.qcow2 file for flash)?
> 
> Back from PTO and catching up :)
> 
> I've gone through the various threads that were linked. Do I understand
> correctly that the current plan is to get QEMU to only allocate
> enough host memory to store the parts of the CODE/VARS files that are
> actually used, and skip the padding? While still using the same
> guest-side offsets?

That is the idea.

> I assume writes to NVRAM will be handled correctly by allocating more
> memory on the fly, right?

Yes.  Strictly speaking we *do* allocate memory, but the kernel
populates that memory area with actual memory pages not at allocation
time, but when the pages are touched the first time.  So when we avoid
accessing unused page rages we need less memory.

> Are there any disadvantage to this
> approach, compared to a hypothetical one that changes the offsets?

The block layer needs to know which parts of the firmware flash file
are actually filled and which ones are not, so we can skip the read()
for the unused ones.  That works with (a) sparse raw files and (b) qcow2
files.

The problem with sparse raw files is that this is apparently incompatible
with rpm packaging (see comment 12), which leaves qcow2 on the table.

> As for libvirt support, does the way in which the files are passed to
> QEMU need to change when they're in qcow2 format instead of raw
> format?

Not different from any other block device backing
store (raw iso images, qcow2 cloud images).

> And would the guest OS see any difference in behavior?

No, only the qemu block layer would see the difference.

> Anyway, if you give me instructions on how to create qcow2 files from
> the existing ones I can give that a try and report back

"qemu-img convert -f raw -O qcow2 $file.raw $file.qcow2" should do.

Comment 15 Andrea Bolognani 2023-01-09 16:51:23 UTC

(In reply to Gerd Hoffmann from comment #14)
> (In reply to Andrea Bolognani from comment #13)
> > As for libvirt support, does the way in which the files are passed to
> > QEMU need to change when they're in qcow2 format instead of raw
> > format?
> 
> Not different from any other block device backing
> store (raw iso images, qcow2 cloud images).
> 
> > And would the guest OS see any difference in behavior?
> 
> No, only the qemu block layer would see the difference.
> 
> > Anyway, if you give me instructions on how to create qcow2 files from
> > the existing ones I can give that a try and report back
> 
> "qemu-img convert -f raw -O qcow2 $file.raw $file.qcow2" should do.

I played with this a bit.

libvirt currently expects everything to be in raw format.
Specifically, the command line it generates will look like

  -blockdev {"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}
  -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}
  -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/efi_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}
  -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}
  -machine pc-q35-7.2,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format

This behavior is completely hardcoded, with no way to influence it.

If a JSON firmware descriptor contains "format": "qcow2", libvirt
will reject it with

  error: Operation not supported: unsupported flash format 'qcow2'

So we need to teach libvirt to accept edk2 builds and NVRAM templates
in qcow2 format. With a hacked build, I was able to successfully boot
using such a setup. I'll look into implementing proper support.


One behavior that I've noticed and found confusing. When using a
NVRAM file in qcow2 format, it seems that it doesn't matter whether I
use "driver": "raw" or "driver": "qcow2" for libvirt-pflash1-format:
in both cases, the guest boots and I'm able to make changes to the
contents of the NVRAM (tested by changing BootOrder via efibootmgr).
Is this expected?


Final remark: it looks like virt-firmware currently doesn't support
qcow2 format, so that will have to be added too I guess :)

Comment 16 Gerd Hoffmann 2023-01-10 09:56:54 UTC

> I played with this a bit.
> 
> libvirt currently expects everything to be in raw format.
> This behavior is completely hardcoded, with no way to influence it.

> If a JSON firmware descriptor contains "format": "qcow2", libvirt
> will reject it with
> 
>   error: Operation not supported: unsupported flash format 'qcow2'

What happens in case two json files are there, one with raw and
one with qcow2?  Will current libvirt versions ignore/skip the
unsupported qcow2 and use raw?

> One behavior that I've noticed and found confusing. When using a
> NVRAM file in qcow2 format, it seems that it doesn't matter whether I
> use "driver": "raw" or "driver": "qcow2" for libvirt-pflash1-format:
> in both cases, the guest boots and I'm able to make changes to the
> contents of the NVRAM (tested by changing BootOrder via efibootmgr).
> Is this expected?

(1) Could be the firmware just initializes the varstore.
(2) Could be the firmware emulates the varstore in RAM.

ovmf does (2), not sure if armvirt behaves the same way.

Do you have a NvVars on the ESP?  That'll indicate (2),
varstore emulation reads/writes it.

Is the file still in qcow2 format afterwards (try qemu-img check)?
If not then it is probably (1).

> Final remark: it looks like virt-firmware currently doesn't support
> qcow2 format, so that will have to be added too I guess :)

Indeed.  Converting back and forth with qemu-img works for the time
being, and enrolling secure boot keys isn't a pressing issue on arm,
so not super urgent but longer-term we surely want that.

While being at it something slightly related:  Where do we stand with
libvirt wrt. firmware image size?  Current fedora rpms have both 2M
and 4M ovmf builds, but only the 2M builds have json files.  If I add
json files for the 4M builds, is libvirt able to figure which of the
two it should use for an existing VM based on the existing varstore?

Comment 17 Andrea Bolognani 2023-01-10 16:47:07 UTC

(In reply to Gerd Hoffmann from comment #16)
> > If a JSON firmware descriptor contains "format": "qcow2", libvirt
> > will reject it with
> > 
> >   error: Operation not supported: unsupported flash format 'qcow2'
> 
> What happens in case two json files are there, one with raw and
> one with qcow2?  Will current libvirt versions ignore/skip the
> unsupported qcow2 and use raw?

It looks like it depends on the ordering. If qcow2.json is sorted
after raw.json, the raw build will be picked and everything will work
as before; if qcow2.json is sorted before raw.json, that build will
be picked, the error message seen above will be reported, and the VM
will be unable to start.

> > One behavior that I've noticed and found confusing. When using a
> > NVRAM file in qcow2 format, it seems that it doesn't matter whether I
> > use "driver": "raw" or "driver": "qcow2" for libvirt-pflash1-format:
> > in both cases, the guest boots and I'm able to make changes to the
> > contents of the NVRAM (tested by changing BootOrder via efibootmgr).
> > Is this expected?
> 
> (1) Could be the firmware just initializes the varstore.
> (2) Could be the firmware emulates the varstore in RAM.
> 
> ovmf does (2), not sure if armvirt behaves the same way.

I'm testing this on x86_64 rather than aarch64 for convenience ;)

> Do you have a NvVars on the ESP?  That'll indicate (2),
> varstore emulation reads/writes it.

Is 'NvVars' supposed to be a file that gets written to the guest's
ESP? If so, it doesn't look like it's there.

  $ find /boot/efi/
  /boot/efi/
  /boot/efi/EFI
  /boot/efi/EFI/debian
  /boot/efi/EFI/debian/shimx64.efi
  /boot/efi/EFI/debian/grubx64.efi
  /boot/efi/EFI/debian/mmx64.efi
  /boot/efi/EFI/debian/fbx64.efi
  /boot/efi/EFI/debian/BOOTX64.CSV
  /boot/efi/EFI/debian/grub.cfg

This is after I have changed BootOrder with 'efibootmgr --bootorder
...' to ensure that there is something that needs flushing to disk.
Maybe the file gets created and then processed very quickly, before I
could have a chance to see it?

> Is the file still in qcow2 format afterwards (try qemu-img check)?
> If not then it is probably (1).

It is still in qcow2 format, and after making changes to BootOrder
from inside the guest OS I can take the file, convert it back to raw
and use virt-fw-dump on it to confirm that the new value has been
written there. So I have no idea what's going on O:-) I would really
expect this to break horribly, but instead it works somehow? O:-)


There's also something else funky that goes on, though it's probably
unrelated. While I can confirm that the NVRAM file actually changes
after I run efibootmgr in the guest OS by running md5sum and
virt-fw-dump on it after VM shutdown, as soon as I boot up the VM
again my changes get reverted. This happens at some point between the
time the VM starts and when grub shows up. Whether I'm using qcow2 or
raw format, both in terms of on-disk format and information passed to
QEMU, doesn't make a difference, this happens every single time. It
might be a Debian thing though? I can try again with Fedora as the
guest OS.

> > Final remark: it looks like virt-firmware currently doesn't support
> > qcow2 format, so that will have to be added too I guess :)
> 
> Indeed.  Converting back and forth with qemu-img works for the time
> being, and enrolling secure boot keys isn't a pressing issue on arm,
> so not super urgent but longer-term we surely want that.

When we start shipping qcow2 builds for edk2-aarch64, will the same
be true for edk2-ovmf? That would make sense to me for consistency's
sake. Would x86_64 benefit from it the same way aarch64 does, or does
the memory usage issue not exist there?

> While being at it something slightly related:  Where do we stand with
> libvirt wrt. firmware image size?  Current fedora rpms have both 2M
> and 4M ovmf builds, but only the 2M builds have json files.  If I add
> json files for the 4M builds, is libvirt able to figure which of the
> two it should use for an existing VM based on the existing varstore?

Thanks for bringing this up, I've actually been thinking about it :)

So, as things are right now libvirt doesn't look at the VARS file and
try to find a matching CODE file. It has code for the opposite, that
is, deal with VMs where only the path to the CODE file has been
provided and there is no NVRAM file or path to a template.

The problem is that firmware autoselection happens at VM startup, so
its outcome is entirely based on the contents of the JSON descriptors
and their order. Adding, removing or even just shuffling around those
files could result in a different firmware being picked on subsequent
startups.

I believe this to be a significant flaw in the implementation, and I
am planning to address it by changing things so that firmware
autoselection is performed just once, when the VM is first defined,
and its outcome is recorded in the XML.

With this change in place, the VM will always get the same firmware
image as long as paths are not reused, which I think is a reasonable
expectation. At that point adding more descriptors, whether for 4M
builds or for qcow2 builds, should not affect existing VMs.

Comment 18 Andrea Bolognani 2023-01-10 18:51:57 UTC

(In reply to Andrea Bolognani from comment #17)
> There's also something else funky that goes on, though it's probably
> unrelated. While I can confirm that the NVRAM file actually changes
> after I run efibootmgr in the guest OS by running md5sum and
> virt-fw-dump on it after VM shutdown, as soon as I boot up the VM
> again my changes get reverted. This happens at some point between the
> time the VM starts and when grub shows up. Whether I'm using qcow2 or
> raw format, both in terms of on-disk format and information passed to
> QEMU, doesn't make a difference, this happens every single time. It
> might be a Debian thing though? I can try again with Fedora as the
> guest OS.

I've tried with Fedora and I get the exact same behavior.

However, this only seem to happen when I use the build of edk2 that
has Secure Boot support enabled: if I add

  <os firmware='efi'>
    <firmware>
      <feature enabled='no' name='secure-boot'/>
    </firmware>
  </os>

to my VM configuration, then the changes I make to BootOrder are
finally persisted across reboots. Note that disabling keys-enrolled
is not enough: it really needs to be the build with no Secure Boot
support that gets used.

Is it somehow a Secure Boot requirement that BootOrder contains a
specific value, and that users have no way of changing it?

Comment 19 Gerd Hoffmann 2023-01-11 09:32:29 UTC

> > What happens in case two json files are there, one with raw and
> > one with qcow2?  Will current libvirt versions ignore/skip the
> > unsupported qcow2 and use raw?
> 
> It looks like it depends on the ordering. If qcow2.json is sorted
> after raw.json, the raw build will be picked and everything will work
> as before; if qcow2.json is sorted before raw.json, that build will
> be picked, the error message seen above will be reported, and the VM
> will be unable to start.

Ok.  So a edk2 build with qcow2 images needs a conflict against older
libvirt versions.

> > Do you have a NvVars on the ESP?  That'll indicate (2),
> > varstore emulation reads/writes it.
> 
> Is 'NvVars' supposed to be a file that gets written to the guest's
> ESP?

Yes.

> It is still in qcow2 format, and after making changes to BootOrder
> from inside the guest OS I can take the file, convert it back to raw
> and use virt-fw-dump on it to confirm that the new value has been
> written there.

Ok, so flash works and ovmf will not write NvVars then.

> So I have no idea what's going on O:-) I would really
> expect this to break horribly, but instead it works somehow? O:-)

Format autodetection kicking in for some reason?
Although that should not happen with explicitly configured format.

> When we start shipping qcow2 builds for edk2-aarch64, will the same
> be true for edk2-ovmf? That would make sense to me for consistency's
> sake. Would x86_64 benefit from it the same way aarch64 does, or does
> the memory usage issue not exist there?

It does not exist on x86_64, so not sure yet how to handle that.

> I believe this to be a significant flaw in the implementation, and I
> am planning to address it by changing things so that firmware
> autoselection is performed just once, when the VM is first defined,
> and its outcome is recorded in the XML.

Makes sense.  What do you record?  Path to the code/vars-template
images itself I guess?  So renumbering/renaming json files wouldn't
break guests?  Also we can migrate to new defaults by leaving
the old images in place and update the json files to point to
the new images.

Should work for both 2M -> 4M transition and raw -> qcow2 transition.

(In reply to Andrea Bolognani from comment #18)
> (In reply to Andrea Bolognani from comment #17)
> > There's also something else funky that goes on, though it's probably
> > unrelated. While I can confirm that the NVRAM file actually changes
> > after I run efibootmgr in the guest OS by running md5sum and
> > virt-fw-dump on it after VM shutdown, as soon as I boot up the VM
> > again my changes get reverted. This happens at some point between the
> > time the VM starts and when grub shows up. Whether I'm using qcow2 or
> > raw format, both in terms of on-disk format and information passed to
> > QEMU, doesn't make a difference, this happens every single time. It
> > might be a Debian thing though? I can try again with Fedora as the
> > guest OS.
> 
> I've tried with Fedora and I get the exact same behavior.
> 
> However, this only seem to happen when I use the build of edk2 that
> has Secure Boot support enabled: if I add
> 
>   <os firmware='efi'>
>     <firmware>
>       <feature enabled='no' name='secure-boot'/>
>     </firmware>
>   </os>
> 
> to my VM configuration, then the changes I make to BootOrder are
> finally persisted across reboots. Note that disabling keys-enrolled
> is not enough: it really needs to be the build with no Secure Boot
> support that gets used.
> 
> Is it somehow a Secure Boot requirement that BootOrder contains a
> specific value, and that users have no way of changing it?

Hmm, good question.  OVMF reorders the boot entries according to the
order it got from qemu (via bootindex).  That should not depend on
secure boot configuration though.  Could also be fallback.efi is
started for some reason.  It comes with shim.efi and will restore
the boot entry for \EFI\fedora\shim${arch}.efi in case it was lost.
Not sure whenever that one has different behavior depending on
secure boot configuration.

Comment 20 Andrea Bolognani 2023-01-11 19:06:54 UTC

(In reply to Gerd Hoffmann from comment #19)
> > > What happens in case two json files are there, one with raw and
> > > one with qcow2?  Will current libvirt versions ignore/skip the
> > > unsupported qcow2 and use raw?
> > 
> > It looks like it depends on the ordering. If qcow2.json is sorted
> > after raw.json, the raw build will be picked and everything will work
> > as before; if qcow2.json is sorted before raw.json, that build will
> > be picked, the error message seen above will be reported, and the VM
> > will be unable to start.
> 
> Ok.  So a edk2 build with qcow2 images needs a conflict against older
> libvirt versions.

I didn't think of this but now that you mention it yes, that sounds
like an obviously good thing to do.

> > So I have no idea what's going on O:-) I would really
> > expect this to break horribly, but instead it works somehow? O:-)
> 
> Format autodetection kicking in for some reason?
> Although that should not happen with explicitly configured format.

No idea.

When I add support for qcow2 formatted NVRAM files I will obviously
make sure that the correct information is provided to QEMU, but the
fact that things somehow work regardless concerns me. I'm worried
that there might be some bug that we haven't noticed yet and that
will bite us later down the line.

Do you think you could try to investigate this behavior and
understand it? So that we know we're okay.

> > I believe this to be a significant flaw in the implementation, and I
> > am planning to address it by changing things so that firmware
> > autoselection is performed just once, when the VM is first defined,
> > and its outcome is recorded in the XML.
> 
> Makes sense.  What do you record?  Path to the code/vars-template
> images itself I guess?

Right now, if you have configured a VM with

  <os firmware='efi'>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
    <boot dev='hd'/>
  </os>

that will be converted to

  <os>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
    <nvram template='/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/nvram/test_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>

every time the VM is started. You can verify this by running 'virsh
dumpxml' before and after starting the VM.

The idea is to perform the lookup only once, when the VM is defined.
The second snippet will automatically replace the first one, making
the choice permanent for the lifetime of the VM.

Incidentally, the information contained in the second snippet is also
the same that users would have to provide themselves before firmware
autoselection was implemented :)

> So renumbering/renaming json files wouldn't
> break guests?  Also we can migrate to new defaults by leaving
> the old images in place and update the json files to point to
> the new images.
> 
> Should work for both 2M -> 4M transition and raw -> qcow2 transition.

That's the idea :)

> > > There's also something else funky that goes on, though it's probably
> > > unrelated. While I can confirm that the NVRAM file actually changes
> > > after I run efibootmgr in the guest OS by running md5sum and
> > > virt-fw-dump on it after VM shutdown, as soon as I boot up the VM
> > > again my changes get reverted. This happens at some point between the
> > > time the VM starts and when grub shows up. Whether I'm using qcow2 or
> > > raw format, both in terms of on-disk format and information passed to
> > > QEMU, doesn't make a difference, this happens every single time. It
> > > might be a Debian thing though? I can try again with Fedora as the
> > > guest OS.
> > 
> > I've tried with Fedora and I get the exact same behavior.
> > 
> > However, this only seem to happen when I use the build of edk2 that
> > has Secure Boot support enabled: if I add
> > 
> >   <os firmware='efi'>
> >     <firmware>
> >       <feature enabled='no' name='secure-boot'/>
> >     </firmware>
> >   </os>
> > 
> > to my VM configuration, then the changes I make to BootOrder are
> > finally persisted across reboots. Note that disabling keys-enrolled
> > is not enough: it really needs to be the build with no Secure Boot
> > support that gets used.
> > 
> > Is it somehow a Secure Boot requirement that BootOrder contains a
> > specific value, and that users have no way of changing it?
> 
> Hmm, good question.  OVMF reorders the boot entries according to the
> order it got from qemu (via bootindex).

But that information should only cover the various disks attached to
the VM, not the partitions or EFI binaries contained within, right?

In my case there's only a single disk attached to the VM.

> That should not depend on
> secure boot configuration though.  Could also be fallback.efi is
> started for some reason.  It comes with shim.efi and will restore
> the boot entry for \EFI\fedora\shim${arch}.efi in case it was lost.
> Not sure whenever that one has different behavior depending on
> secure boot configuration.

I don't think that's it either: both when testing with Secure Boot on
and with Secure Boot off, all I ever did was shuffle around the items
in BootOrder, so no entry ever got lost.

Can you please give a closer look to this topic too? It might be an
intentional behavior, but it's so unintuitive that I don't feel that
we can rule out a genuine bug.

Comment 21 Gerd Hoffmann 2023-01-12 11:54:34 UTC

> > Format autodetection kicking in for some reason?
> > Although that should not happen with explicitly configured format.
> 
> No idea.
> 
> When I add support for qcow2 formatted NVRAM files I will obviously
> make sure that the correct information is provided to QEMU, but the
> fact that things somehow work regardless concerns me. I'm worried
> that there might be some bug that we haven't noticed yet and that
> will bite us later down the line.

Seems so.

I get identical behavior, no matter whenever I specify format=raw or format=qcow2.
Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2' here.

No errors being logged by ovmf, which I'd expect to be the case for format=raw.
Also the file is still valid qcow2 afterwards, for format=raw I'd expect that
being corrupted due to ovmf re-formating variable store in flash (checked source
code again, using varstore emulation happens only in case flash is not present,
invalid data in flash triggers re-format instead).

Kevin?

fun fact: virt-fw-dump actually works on qcow2 images, it happens to find the
data blocks inside:

+ virt-fw-dump -i /home/kraxel/projects/edk2/Artifacts/ovmf-x64/OVMF_VARS.qcow2
image=/home/kraxel/projects/edk2/Artifacts/ovmf-x64/OVMF_VARS.qcow2
  volume=guid:NvData offset=0x5000 size=0x84000 hlen=0x48 xoff=0x0 rev=2 blocks=132*4096 used=48.5%
                     ^^^^^^^^^^^^^ note the non-zero offset here ;)
    nvdata=guid:AuthVars size=0x3ffb8 format=0x5a state=0xfe
      end of variable list

> > Makes sense.  What do you record?  Path to the code/vars-template
> > images itself I guess?
> 
> Right now, if you have configured a VM with
> 
>   <os firmware='efi'>
>     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
>     <boot dev='hd'/>
>   </os>
> 
> that will be converted to
> 
>   <os>
>     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
>     <loader readonly='yes' secure='yes'
> type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
>     <nvram
> template='/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/
> nvram/test_VARS.fd</nvram>
>     <boot dev='hd'/>
>   </os>
> 
> every time the VM is started. You can verify this by running 'virsh
> dumpxml' before and after starting the VM.
> 
> The idea is to perform the lookup only once, when the VM is defined.
> The second snippet will automatically replace the first one, making
> the choice permanent for the lifetime of the VM.
> 
> Incidentally, the information contained in the second snippet is also
> the same that users would have to provide themselves before firmware
> autoselection was implemented :)

Looks good to me.  We might keep the 'firmware=efi' and any feature
options though, for documentation purposes and to allow re-trigger
the autoselect by just deleting the <loader/> and <nvram/> lines.

> But that information should only cover the various disks attached to
> the VM, not the partitions or EFI binaries contained within, right?

Yes.  But boot entries pointing to a file in a partition
on a disk with bootindex attached will be reordered too.

i.e. typically you'll have a "Fedora" entry pointing to hd0/ESP/shim.efi
and a "QEMU HARDDISK" entry pointing to hd0, and if that disk has a
bootindex ovmf will place both where they should be according to the
disk bootindex.  Usually it happens on first boot only, later the
entries are already in the correct order so nothing changes.

> > Could also be fallback.efi is
> > started for some reason.  It comes with shim.efi and will restore
> > the boot entry for \EFI\fedora\shim${arch}.efi in case it was lost.
> > Not sure whenever that one has different behavior depending on
> > secure boot configuration.
> 
> I don't think that's it either: both when testing with Secure Boot on
> and with Secure Boot off, all I ever did was shuffle around the items
> in BootOrder, so no entry ever got lost.

fallback.efi should only add entries, i.e it will restore the 'fedora'
entry mentioned above in case it was lost.

The firmware might clean up invalid entries, for example entries pointing
to a non-existing device.  So if you add stuff using efibootmgr the
device path better make sure they do not point into nowhere.

Comment 22 Gerd Hoffmann 2023-01-12 12:03:53 UTC

> > When I add support for qcow2 formatted NVRAM files I will obviously
> > make sure that the correct information is provided to QEMU, but the
> > fact that things somehow work regardless concerns me. I'm worried
> > that there might be some bug that we haven't noticed yet and that
> > will bite us later down the line.
> 
> Seems so.
> 
> I get identical behavior, no matter whenever I specify format=raw or
> format=qcow2.
> Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2'
> here.

Ok, figured what happened with "format=raw".  From "info mtree":

    00000000ffbfb000-00000000ffc83fff (prio 0, romd): system.flash1
    00000000ffc84000-00000000ffffffff (prio 0, romd): system.flash0

So the flash starts at 0xffbfb000 instead of 0xffc00000.  The
ffbfb000->ffbfffff range holds the qcow2 meta data.  The actual data
starts (as expected by the firmware) at 0xffc00000 ...

Comment 23 Kevin Wolf 2023-01-12 13:44:29 UTC

That makes sense. And it may or may not work once you go beyond the first qcow2 cluster, depending on how things were allocated on the qcow2 level. If it's the result of 'qemu-img convert', your chances are actually relatively good that it happens to be sequential. Not recommended anyway. :-)

Comment 24 Andrea Bolognani 2023-01-12 15:50:06 UTC

(In reply to Gerd Hoffmann from comment #21)
> I get identical behavior, no matter whenever I specify format=raw or
> format=qcow2.
> Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2'
> here.

Did you use format=qcow2 or format=raw here? Because format=qcow2 is
the one where I expect things to work, and you refer to format=raw
later.

> fun fact: virt-fw-dump actually works on qcow2 images, it happens to find the
> data blocks inside:
> 
> + virt-fw-dump -i
> /home/kraxel/projects/edk2/Artifacts/ovmf-x64/OVMF_VARS.qcow2
> image=/home/kraxel/projects/edk2/Artifacts/ovmf-x64/OVMF_VARS.qcow2
>   volume=guid:NvData offset=0x5000 size=0x84000 hlen=0x48 xoff=0x0 rev=2
> blocks=132*4096 used=48.5%
>                      ^^^^^^^^^^^^^ note the non-zero offset here ;)
>     nvdata=guid:AuthVars size=0x3ffb8 format=0x5a state=0xfe
>       end of variable list

That doesn't seem to work here, at least not with a qcow2 file that's
been created by running 'qemu-img convert' on a populated raw file
from a working VM:

  $ virt-fw-dump -i efi_VARS.fd | head -2
  image=efi_VARS.fd
    volume=guid:NvData offset=0x0 size=0x20000 hlen=0x48 xoff=0x0 rev=2 blocks=32*4096 used=43.7%
      nvdata=guid:AuthVars size=0xdfb8 format=0x5a state=0xfe
        variable=guid:EfiCustomModeEnable size=0x53 attr=0x3 name=CustomMode
          bool: off
        variable=guid:EfiGlobalVariable size=0xa2c attr=0x27 name=KEK
          blob: 2536 bytes
        variable=guid:EfiGlobalVariable size=0x412 attr=0x27 name=PK
          blob: 976 bytes
        variable=guid:EfiSecureBootEnableDisable size=0x5f attr=0x3 name=SecureBootEnable
  $ qemu-img convert -f raw -O qcow2 efi_VARS.fd efi_VARS.qcow2
  $ virt-fw-dump -i efi_VARS.qcow2 | head
  image=efi_VARS.qcow2
  $ rpm -q python3-virt-firmware
  python3-virt-firmware-1.7-1.fc37.noarch

> > Right now, if you have configured a VM with
> > 
> >   <os firmware='efi'>
> >     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
> >     <boot dev='hd'/>
> >   </os>
> > 
> > that will be converted to
> > 
> >   <os>
> >     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
> >     <loader readonly='yes' secure='yes'
> > type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
> >     <nvram
> > template='/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/
> > nvram/test_VARS.fd</nvram>
> >     <boot dev='hd'/>
> >   </os>
> > 
> > every time the VM is started. You can verify this by running 'virsh
> > dumpxml' before and after starting the VM.
> > 
> > The idea is to perform the lookup only once, when the VM is defined.
> > The second snippet will automatically replace the first one, making
> > the choice permanent for the lifetime of the VM.
> > 
> > Incidentally, the information contained in the second snippet is also
> > the same that users would have to provide themselves before firmware
> > autoselection was implemented :)
> 
> Looks good to me.  We might keep the 'firmware=efi' and any feature
> options though, for documentation purposes and to allow re-trigger
> the autoselect by just deleting the <loader/> and <nvram/> lines.

That sounds convenient, but libvirt expects to either be asked to
perform firmware autoselection, optionally informed by a list of
required/undesired features, or be provided a full configuration with
all paths specified manually. Attempts to mix the two approaches are
intentionally rejected.

If we kept all information, it would no longer be possible to take
the output of 'virsh dumpxml' and feed it to 'virsh define', which is
undesirable. I'm pretty sure migration would break too.

So I'm afraid we're going to have to live with the inconvenience.

> > But that information should only cover the various disks attached to
> > the VM, not the partitions or EFI binaries contained within, right?
> 
> Yes.  But boot entries pointing to a file in a partition
> on a disk with bootindex attached will be reordered too.
> 
> i.e. typically you'll have a "Fedora" entry pointing to hd0/ESP/shim.efi
> and a "QEMU HARDDISK" entry pointing to hd0, and if that disk has a
> bootindex ovmf will place both where they should be according to the
> disk bootindex.  Usually it happens on first boot only, later the
> entries are already in the correct order so nothing changes.

Okay, so entries will be grouped together based on the disk and then
sorted by the disk's bootindex, right? Which means that if I had

  0001 Fedora on vda
  0002 Debian on vda
  0003 vda
  0004 Windows on vdb
  0005 vdb

when vda.bootindex=1 and vdb.bootindex=2, switching the bootindexs
would result in

  0001 Windows on vdb
  0002 vdb
  0003 Fedora on vda
  0004 Debian on vda
  0005 vda

correct?

> > I don't think that's it either: both when testing with Secure Boot on
> > and with Secure Boot off, all I ever did was shuffle around the items
> > in BootOrder, so no entry ever got lost.
> 
> fallback.efi should only add entries, i.e it will restore the 'fedora'
> entry mentioned above in case it was lost.
> 
> The firmware might clean up invalid entries, for example entries pointing
> to a non-existing device.  So if you add stuff using efibootmgr the
> device path better make sure they do not point into nowhere.

No, that's not it. As I said, I was simply changing the order in
which items appeared in BootOrder, not adding or removing entries.

But I think I understand what is going on now, and the Secure Boot
part seems to be a red herring.

Without Secure Boot, I have the following configuration:

  BootOrder: 0003,0001,0000,0002
  Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
  Boot0001* UEFI Misc Device	PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0){auto_created_boot_option}
  Boot0002* EFI Internal Shell	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)
  Boot0003* Fedora	HD(1,GPT,a49a640b-b04b-4ca3-bcb2-fae9a9676b9c,0x800,0x12c000)/File(\EFI\fedora\shimx64.efi)

With Secure Boot, the entry for the EFI shell is missing, which I
guess is intentional because shell access could provide an access
point to an attacker?

Anyway, starting with the configuration above, I can make *some*
changes that will be persisted, while other changes will be reverted.

Specifically, I can invert the order in which 0000 and 0002 show up,
but I can't put either of those between 0003 and 0001. So if I set
BootOrder to 0003,0002,0001,0000, what I will get after a reboot will
be

  BootOrder: 0003,0001,0002,0000
  Boot0000* UiApp	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
  Boot0001* UEFI Misc Device	PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x0){auto_created_boot_option}
  Boot0002* EFI Internal Shell	FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)
  Boot0003* Fedora	HD(1,GPT,a49a640b-b04b-4ca3-bcb2-fae9a9676b9c,0x800,0x12c000)/File(\EFI\fedora\shimx64.efi)

So the change of relative priority between 0000 and 0002 will be
retained, but 0003 and 0001 will still take precedence on those two.

I imagine this is done on purpose, and edk2 must contain some fairly
sophisticated logic implementing the sorting algorithm.

In conclusion, I no longer believe that this could be a bug in edk2:
just a fairly confusing behavior :)

Comment 25 Andrea Bolognani 2023-01-12 18:26:05 UTC

(In reply to Gerd Hoffmann from comment #22)
> > I get identical behavior, no matter whenever I specify format=raw or
> > format=qcow2.
> > Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2'
> > here.
> 
> Ok, figured what happened with "format=raw".  From "info mtree":
> 
>     00000000ffbfb000-00000000ffc83fff (prio 0, romd): system.flash1
>     00000000ffc84000-00000000ffffffff (prio 0, romd): system.flash0
> 
> So the flash starts at 0xffbfb000 instead of 0xffc00000.  The
> ffbfb000->ffbfffff range holds the qcow2 meta data.  The actual data
> starts (as expected by the firmware) at 0xffc00000 ...

Sorry, this is a bit outside of my comfort zone so I'm going to need
some help deciphering it O:-)

So the offset at which the data is mapped depends on format
autodetection, overriding what's been explicitly requested on the
command line? That can't be right, can it?

Comment 26 Gerd Hoffmann 2023-01-13 07:51:17 UTC

(In reply to Andrea Bolognani from comment #24)
> (In reply to Gerd Hoffmann from comment #21)
> > I get identical behavior, no matter whenever I specify format=raw or
> > format=qcow2.
> > Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2'
> > here.
> 
> Did you use format=qcow2 or format=raw here? Because format=qcow2 is
> the one where I expect things to work, and you refer to format=raw
> later.

Tested both.

> That doesn't seem to work here, at least not with a qcow2 file that's
> been created by running 'qemu-img convert' on a populated raw file
> from a working VM:

I've configured page-sized clusters, i.e.

qemu-img convert -f raw -O qcow2 -o cluster_size=4096 -S 4096 $raw $qcow2

Maybe you get a non-working layout with the default settings.

> > i.e. typically you'll have a "Fedora" entry pointing to hd0/ESP/shim.efi
> > and a "QEMU HARDDISK" entry pointing to hd0, and if that disk has a
> > bootindex ovmf will place both where they should be according to the
> > disk bootindex.  Usually it happens on first boot only, later the
> > entries are already in the correct order so nothing changes.
> 
> Okay, so entries will be grouped together based on the disk and then
> sorted by the disk's bootindex, right? Which means that if I had
> 
>   0001 Fedora on vda
>   0002 Debian on vda
>   0003 vda
>   0004 Windows on vdb
>   0005 vdb
> 
> when vda.bootindex=1 and vdb.bootindex=2, switching the bootindexs
> would result in
> 
>   0001 Windows on vdb
>   0002 vdb
>   0003 Fedora on vda
>   0004 Debian on vda
>   0005 vda
> 
> correct?

Yes.

> With Secure Boot, the entry for the EFI shell is missing, which I
> guess is intentional because shell access could provide an access
> point to an attacker?

Yes.

> So the change of relative priority between 0000 and 0002 will be
> retained, but 0003 and 0001 will still take precedence on those two.
> 
> I imagine this is done on purpose, and edk2 must contain some fairly
> sophisticated logic implementing the sorting algorithm.

Indeed.  And there is also logic to manage the shell entry,
i.e. add/remove the entry depending in whenever the shell
is present in the firmware image or not.

> In conclusion, I no longer believe that this could be a bug in edk2:
> just a fairly confusing behavior :)

Comment 27 Gerd Hoffmann 2023-01-13 08:03:46 UTC

(In reply to Andrea Bolognani from comment #25)
> (In reply to Gerd Hoffmann from comment #22)
> > > I get identical behavior, no matter whenever I specify format=raw or
> > > format=qcow2.
> > > Using 'qemu -drive if=pflash,format=qcow2,file=/path/to/OVMF_VARS.qcow2'
> > > here.
> > 
> > Ok, figured what happened with "format=raw".  From "info mtree":
> > 
> >     00000000ffbfb000-00000000ffc83fff (prio 0, romd): system.flash1
> >     00000000ffc84000-00000000ffffffff (prio 0, romd): system.flash0
> > 
> > So the flash starts at 0xffbfb000 instead of 0xffc00000.  The
> > ffbfb000->ffbfffff range holds the qcow2 meta data.  The actual data
> > starts (as expected by the firmware) at 0xffc00000 ...
> 
> Sorry, this is a bit outside of my comfort zone so I'm going to need
> some help deciphering it O:-)
> 
> So the offset at which the data is mapped depends on format
> autodetection, overriding what's been explicitly requested on the
> command line? That can't be right, can it?

Nope.  Qemu just packs the flash images one after the other,
starting from 4G downwards (on x86).

The qcow2 image created by 'qemu-img convert' first has the qcow2
metadata, followed by the actual data, cluster by cluster, same
ordering they have in the raw file.  If you map that as raw flash
the automatic packing moves down the start address because the qcow2
file is larger.  And qcow2 meta data happens to land in the additional
space whereas the actual data happens to land exactly where the
firmware expects it to be.

Comment 28 Gerd Hoffmann 2023-01-13 08:11:03 UTC

> > The idea is to perform the lookup only once, when the VM is defined.
> > The second snippet will automatically replace the first one, making
> > the choice permanent for the lifetime of the VM.

What happens with existing VMs?
I assume the conversion happens when you start the VM the first time with the new libvirt version?

I'm wondering how we handle the update best then, because I think the process needed will be:

  (1) update libvirt
  (2) boot each VM once (to trigger conversion).
  (3) update to edk2 version which uses other default images.

And I don't think we can enforce (2).

Comment 29 Andrea Bolognani 2023-01-13 10:32:06 UTC

(In reply to Gerd Hoffmann from comment #26)
> (In reply to Andrea Bolognani from comment #24)
> > That doesn't seem to work here, at least not with a qcow2 file that's
> > been created by running 'qemu-img convert' on a populated raw file
> > from a working VM:
> 
> I've configured page-sized clusters, i.e.
> 
> qemu-img convert -f raw -O qcow2 -o cluster_size=4096 -S 4096 $raw $qcow2
> 
> Maybe you get a non-working layout with the default settings.

Yeah, it works when I specify the cluster size. To be fair, you never
told me that I needed those extra arguments O:-)

For reference,

  image: efi_VARS.default.qcow2
  file format: qcow2
  virtual size: 128 KiB (131072 bytes)
  disk size: 388 KiB
  cluster_size: 65536

and

  image: efi_VARS.4096.qcow2
  file format: qcow2
  virtual size: 128 KiB (131072 bytes)
  disk size: 148 KiB
  cluster_size: 4096

Anyway, I assume the qcow2 VARS template in the package will come
with the correct cluster size, so from the user's point of view
virt-firmware will work out of the box and we don't have to worry
about it.

Comment 30 Andrea Bolognani 2023-01-13 10:40:17 UTC

(In reply to Gerd Hoffmann from comment #27)
> (In reply to Andrea Bolognani from comment #25)
> > (In reply to Gerd Hoffmann from comment #22)
> > > Ok, figured what happened with "format=raw".  From "info mtree":
> > > 
> > >     00000000ffbfb000-00000000ffc83fff (prio 0, romd): system.flash1
> > >     00000000ffc84000-00000000ffffffff (prio 0, romd): system.flash0
> > > 
> > > So the flash starts at 0xffbfb000 instead of 0xffc00000.  The
> > > ffbfb000->ffbfffff range holds the qcow2 meta data.  The actual data
> > > starts (as expected by the firmware) at 0xffc00000 ...
> > 
> > Sorry, this is a bit outside of my comfort zone so I'm going to need
> > some help deciphering it O:-)
> > 
> > So the offset at which the data is mapped depends on format
> > autodetection, overriding what's been explicitly requested on the
> > command line? That can't be right, can it?
> 
> Nope.  Qemu just packs the flash images one after the other,
> starting from 4G downwards (on x86).
> 
> The qcow2 image created by 'qemu-img convert' first has the qcow2
> metadata, followed by the actual data, cluster by cluster, same
> ordering they have in the raw file.  If you map that as raw flash
> the automatic packing moves down the start address because the qcow2
> file is larger.  And qcow2 meta data happens to land in the additional
> space whereas the actual data happens to land exactly where the
> firmware expects it to be.

Thanks, the explanation helped a lot.

So this pretty much only works by chance. We'll make sure that
libvirt provides the correct format information to QEMU, but I'm
positive someone will find a way to trigger a breakage because of
this :D

Comment 31 Andrea Bolognani 2023-01-13 10:49:06 UTC

(In reply to Gerd Hoffmann from comment #28)
> > > The idea is to perform the lookup only once, when the VM is defined.
> > > The second snippet will automatically replace the first one, making
> > > the choice permanent for the lifetime of the VM.
> 
> What happens with existing VMs?
> I assume the conversion happens when you start the VM the first time with
> the new libvirt version?
> 
> I'm wondering how we handle the update best then, because I think the
> process needed will be:
> 
>   (1) update libvirt
>   (2) boot each VM once (to trigger conversion).
>   (3) update to edk2 version which uses other default images.
> 
> And I don't think we can enforce (2).

I'm going to be honest: I haven't actually thought this part through
quite yet.

I agree that it's a challenge, and I'm afraid the result might end up
being messy and unsatisfactory. We'll see.

I hope that things will become clearer to me once I have started
implementing the necessary libvirt changes. I'll get back to you.

Comment 32 Gerd Hoffmann 2023-01-13 12:59:39 UTC

> Yeah, it works when I specify the cluster size. To be fair, you never
> told me that I needed those extra arguments O:-)

I figured for the smaller images a smaller cluster size
makes sense to keep the files small, and using PAGE_SIZE
looked like a reasonable pick to me.

Didn't try virt-fw-dump with default setting, but it stops
scanning after a while, so if the metadata block is big enough
it will not work.

Anyway, this was just luck and virt-fw-vars doesn't work so
virt-firmware needs proper support anyway.  Not sure yet
how to do that best, piping through "qemu-img dd" should do
I think.

Kevin?  Does that sound ok?  Or is there something better?
A python module for qcow2 access maybe?

> Anyway, I assume the qcow2 VARS template in the package will come
> with the correct cluster size, so from the user's point of view
> virt-firmware will work out of the box and we don't have to worry
> about it.

Yes.  https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/
has packages (only arm has qcow2 images for now, and no json updates
yet for obvious reasons).

Comment 33 Kevin Wolf 2023-01-16 08:52:19 UTC

(In reply to Gerd Hoffmann from comment #32)
> I figured for the smaller images a smaller cluster size
> makes sense to keep the files small, and using PAGE_SIZE
> looked like a reasonable pick to me.

I agree, if the whole point is that you save memory in the QEMU process, you want to know the allocation status per page.

The downside of small cluster sizes is that you get more metadata both on the filesystem and cached in memory. With 64 MB images, the effect isn't that big yet, should be around 128k with 4k cluster size. Still, you could reduce it further by using 128k clusters with subclusters (which would then be 4k in size).

> Anyway, this was just luck and virt-fw-vars doesn't work so
> virt-firmware needs proper support anyway.  Not sure yet
> how to do that best, piping through "qemu-img dd" should do
> I think.
> 
> Kevin?  Does that sound ok?  Or is there something better?
> A python module for qcow2 access maybe?

-ENOCONTEXT, but depending on how much control you need, either 'qemu-img convert', 'qemu-img dd' or even 'qemu-io' with 'write -s' could be used to modify a qcow2 image. We don't have a Python module or anything like that.

Comment 34 Gerd Hoffmann 2023-01-16 09:41:21 UTC

> I agree, if the whole point is that you save memory in the QEMU process, you
> want to know the allocation status per page.
> 
> The downside of small cluster sizes is that you get more metadata both on
> the filesystem and cached in memory. With 64 MB images, the effect isn't
> that big yet, should be around 128k with 4k cluster size. Still, you could
> reduce it further by using 128k clusters with subclusters (which would then
> be 4k in size).

Hmm, "man qemu-img" doesn't mention subclusters.  How would I enable that?

Right now I'm using:

for raw in */aarch64/*.raw; do
    qcow2="${raw%.raw}.qcow2"
    qemu-img convert -f raw -O qcow2 -o cluster_size=4096 -S 4096 "$raw" "$qcow2"
done

> > Kevin?  Does that sound ok?  Or is there something better?
> > A python module for qcow2 access maybe?
> 
> -ENOCONTEXT, but depending on how much control you need, either 'qemu-img
> convert', 'qemu-img dd' or even 'qemu-io' with 'write -s' could be used to
> modify a qcow2 image. We don't have a Python module or anything like that.

virt-firmware reads/writes complete raw files.  So one option to support qcow2
files would be to read/write temporary raw files and use 'qemu-img convert'
with automatic sparse detection ('-S 4096').

Question is if there is some easy way to avoid temporary files.
Can I read from / pipe into 'qemu-img convert'?
Can I read from / pipe into 'qemu-img dd'?
Will 'dd' do sparse detection like 'convert'?

Comment 35 Kevin Wolf 2023-01-16 12:45:30 UTC

(In reply to Gerd Hoffmann from comment #34)
> Hmm, "man qemu-img" doesn't mention subclusters.  How would I enable that?

It would be '-o cluster_size=128k,extended_l2=on', but on second thoughts you said that most of the 64 MB will be unused (the reason for even optimising it), so you might not even get to the point where it's better than just using 4k clusters. Almost certainly not on disk, and you would have to measure the memory usage at runtime to tell.

> virt-firmware reads/writes complete raw files.  So one option to support qcow2
> files would be to read/write temporary raw files and use 'qemu-img convert'
> with automatic sparse detection ('-S 4096').

I think this is probably the best option.
 
> Question is if there is some easy way to avoid temporary files.
> Can I read from / pipe into 'qemu-img convert'?
> Can I read from / pipe into 'qemu-img dd'?
> Will 'dd' do sparse detection like 'convert'?

qemu-img can't use pipes because it supports arbitrary image formats, and that needs random access. 'qemu-img dd' is much simpler than 'qemu-img convert', it never creates sparse output.

While it's a bit ugly and I wouldn't really recommend it, you can pipe into qemu-io. I just tried it like this:

$ cat ~/images/hd.img | ./qemu-io -c 'write -s /dev/stdin 0 64M' /tmp/test.qcow2 

This also doesn't detect any zeroed ranges to keep them sparse. But if the size of the write request is the size of your unpadded raw image, at least the space behind it would stay unallocated. If you do need the sparse detection, I suppose we could somehow pass detect_zeroes=on as well, but at that point, 'qemu-img convert' is probably the better option.

Comment 36 Andrea Bolognani 2023-01-17 13:12:47 UTC

Hi Gerd,

one question about VARS files.

In general, is it safe to assume that the size of an instantiated
VARS file will match that of the corresponding template? This seems
to be the case based on my observations, but I want to have a
definitive answer before I write code that relies on this behavior.

Thanks!

Comment 37 Gerd Hoffmann 2023-01-18 09:28:41 UTC

(In reply to Andrea Bolognani from comment #36)
> Hi Gerd,
> 
> one question about VARS files.
> 
> In general, is it safe to assume that the size of an instantiated
> VARS file will match that of the corresponding template? This seems
> to be the case based on my observations, but I want to have a
> definitive answer before I write code that relies on this behavior.

Yes.

For qcow2 files not file size but what qemu-img reports as "virtual size".

Comment 38 Gerd Hoffmann 2023-03-01 09:08:52 UTC

*** Bug 2019723 has been marked as a duplicate of this bug. ***

Comment 39 Andrea Bolognani 2023-03-28 12:33:13 UTC

An update, now that the libvirt parts have been merged.

(In reply to Andrea Bolognani from comment #24)
> (In reply to Gerd Hoffmann from comment #21)
> > > Right now, if you have configured a VM with
> > > 
> > >   <os firmware='efi'>
> > >     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
> > >     <boot dev='hd'/>
> > >   </os>
> > > 
> > > that will be converted to
> > > 
> > >   <os>
> > >     <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
> > >     <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
> > >     <nvram template='/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd'>/var/lib/libvirt/qemu/nvram/test_VARS.fd</nvram>
> > >     <boot dev='hd'/>
> > >   </os>
> > > 
> > > every time the VM is started. You can verify this by running 'virsh
> > > dumpxml' before and after starting the VM.
> > > 
> > > The idea is to perform the lookup only once, when the VM is defined.
> > > The second snippet will automatically replace the first one, making
> > > the choice permanent for the lifetime of the VM.
> > > 
> > > Incidentally, the information contained in the second snippet is also
> > > the same that users would have to provide themselves before firmware
> > > autoselection was implemented :)
> > 
> > Looks good to me.  We might keep the 'firmware=efi' and any feature
> > options though, for documentation purposes and to allow re-trigger
> > the autoselect by just deleting the <loader/> and <nvram/> lines.
> 
> That sounds convenient, but libvirt expects to either be asked to
> perform firmware autoselection, optionally informed by a list of
> required/undesired features, or be provided a full configuration with
> all paths specified manually. Attempts to mix the two approaches are
> intentionally rejected.
> 
> If we kept all information, it would no longer be possible to take
> the output of 'virsh dumpxml' and feed it to 'virsh define', which is
> undesirable. I'm pretty sure migration would break too.

I've actually managed to make this work with some follow-up patches
to the initial qcow2 support :)

Going forward, libvirt will not only preserve information about the
type of firmware and its feature provided by the user: it will
actually *add* those information if they weren't present in the
initial configuration but could be figured out from the contents of
the JSON descriptors.

(In reply to Andrea Bolognani from comment #31)
> (In reply to Gerd Hoffmann from comment #28)
> > What happens with existing VMs?
> > I assume the conversion happens when you start the VM the first time with
> > the new libvirt version?
> > 
> > I'm wondering how we handle the update best then, because I think the
> > process needed will be:
> > 
> >   (1) update libvirt
> >   (2) boot each VM once (to trigger conversion).
> >   (3) update to edk2 version which uses other default images.
> > 
> > And I don't think we can enforce (2).
> 
> I'm going to be honest: I haven't actually thought this part through
> quite yet.
> 
> I agree that it's a challenge, and I'm afraid the result might end up
> being messy and unsatisfactory. We'll see.
> 
> I hope that things will become clearer to me once I have started
> implementing the necessary libvirt changes. I'll get back to you.

Unfortunately I had sort of forgotten about this topic while I was
working on the libvirt changes, but looking back at it now the answer
is actually quite obvious: nothing happens to existing VMs!

Newly defined VMs get qcow2, if that's the preferred format based on
firmware descriptors or they've explicitly asked for it, but existing
ones keep using raw firmware images, same as they were before.

Comment 40 Gerd Hoffmann 2023-04-04 07:50:10 UTC

> > >   (1) update libvirt
> > >   (2) boot each VM once (to trigger conversion).
> > >   (3) update to edk2 version which uses other default images.

> > I hope that things will become clearer to me once I have started
> > implementing the necessary libvirt changes. I'll get back to you.
> 
> Unfortunately I had sort of forgotten about this topic while I was
> working on the libvirt changes, but looking back at it now the answer
> is actually quite obvious: nothing happens to existing VMs!

So, there is a vm with <os firmware='efi'> (old style).

(1) [libvirt update] will not change anything.
(2) [boot guest] will convert to new-style, using the OLD json files.
(3) [edk2 update] will not change anything.

Now, with different ordering, we get a different result:

[1] [libvirt update] no change
[2] [edk2 update] no change
[3] [boot guest] will convert to new-style, using the NEW json files.

Or will libvirt look at the existing nvram file (check size + format)
when searching for a matching *.json file?

Comment 41 Andrea Bolognani 2023-04-04 13:29:48 UTC

(In reply to Gerd Hoffmann from comment #40)
> > > >   (1) update libvirt
> > > >   (2) boot each VM once (to trigger conversion).
> > > >   (3) update to edk2 version which uses other default images.
> 
> > > I hope that things will become clearer to me once I have started
> > > implementing the necessary libvirt changes. I'll get back to you.
> > 
> > Unfortunately I had sort of forgotten about this topic while I was
> > working on the libvirt changes, but looking back at it now the answer
> > is actually quite obvious: nothing happens to existing VMs!
> 
> So, there is a vm with <os firmware='efi'> (old style).
> 
> (1) [libvirt update] will not change anything.
> (2) [boot guest] will convert to new-style, using the OLD json files.
> (3) [edk2 update] will not change anything.

The change from the old style to the new one actually happens
internally when the VM configuration is parsed from disk. As soon as
libvirtd is restarted after the upgrade, `virsh dumpxml` will start
producing the expanded (new style) output.

> Now, with different ordering, we get a different result:
> 
> [1] [libvirt update] no change
> [2] [edk2 update] no change
> [3] [boot guest] will convert to new-style, using the NEW json files.
> 
> Or will libvirt look at the existing nvram file (check size + format)
> when searching for a matching *.json file?

No, we don't look at the existing files. We don't need to.

As I mentioned above, firmware selection happens when the XML is
loaded. When that happens, libvirt knows whether the XML that's being
processed belongs to an existing VM or a newly-defined one, and
behaves accordingly.

Specifically, when presented with a configuration such as

  <os firmware='efi'>

for a newly-defined VM on a host where qcow2 is the preferred
firmware format, it will expand it to something like

  <os firmware='efi'>
    <firmware>
      <feature enabled='no' name='enrolled-keys'/>
      <feature enabled='no' name='secure-boot'/>
    </firmware>
    <loader readonly='yes' type='pflash' format='qcow2'>/usr/share/AAVMF/AAVMF_CODE.qcow2</loader>
    <nvram template='/usr/share/AAVMF/AAVMF_VARS.qcow2' format='qcow2'>/var/lib/libvirt/qemu/nvram/guest_VARS.qcow2</nvram>
  </os>

Notice the "format" attribute, which can be used to manually override
the system-wide preference derived from the order of descriptors.

When loading the same configuration as above but in the context of an
existing VM rather than a new one, the expansion will instead look
like

  <os firmware='efi'>
    <firmware>
      <feature enabled='no' name='enrolled-keys'/>
      <feature enabled='no' name='secure-boot'/>
    </firmware>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram template='/usr/share/AAVMF/AAVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/guest_VARS.fd</nvram>
  </os>

because the default firmware format for existing VM is assumed to be
raw. This is done with the explicit purpose of maintaining backwards
compatibility.

Of course, since *the expanded configuration* gets written to disk
when a VM is defined, VMs that get qcow2 firmware the first time
around will keep getting qcow2 firmware. It's just the VMs that were
defined before qcow2 support was introduced that will always get raw
firmware going forward.

Basically, from the edk2 side, the only precaution that's necessary
is making sure that descriptors for qcow2 firmware images are not
presented to libvirt < 9.2.0: due to bugs in libvirt, that
combination would make all VMs unbootable.

Other than that, new VMs will get qcow2 firmware and existing VMs
will keep using raw firmware. As smooth as it could possibly be :)

Comment 42 Gerd Hoffmann 2023-04-05 10:39:06 UTC

> The change from the old style to the new one actually happens
> internally when the VM configuration is parsed from disk. As soon as
> libvirtd is restarted after the upgrade, `virsh dumpxml` will start
> producing the expanded (new style) output.

> because the default firmware format for existing VM is assumed to be
> raw. This is done with the explicit purpose of maintaining backwards
> compatibility.

Ok, looks sane to me.

Related question: What about switching the default from 2M to 4M builds
in x86 Fedora (both raw)?

I think for that the vmram size of existing VMs must be checked against
the template size of the firmware config.

Comment 43 Gerd Hoffmann 2023-04-05 10:49:49 UTC

> As I mentioned above, firmware selection happens when the XML is
> loaded.

Who exactly does that?  libvirtd?
So I need a conflict against libvirt-daemon.rpm?

Comment 44 Andrea Bolognani 2023-04-05 13:27:23 UTC

(In reply to Gerd Hoffmann from comment #42)
> Related question: What about switching the default from 2M to 4M builds
> in x86 Fedora (both raw)?
> 
> I think for that the vmram size of existing VMs must be checked against
> the template size of the firmware config.

Yeah, that one is more tricky so I've intentionally avoided tackling
it so far.

How urgent/desirable is it to change the default in this case? What's
the advantage of using 4M builds instead of 2M builds?


(In reply to Gerd Hoffmann from comment #43)
> > As I mentioned above, firmware selection happens when the XML is
> > loaded.
> 
> Who exactly does that?  libvirtd?

Yes.

> So I need a conflict against libvirt-daemon.rpm?

Since it's specifically the QEMU hypervisor driver that wasn't able
to handle descriptors for qcow2 builds until recently, I think
something along the lines of

  Conflicts: libvirt-daemon-driver-qemu < 9.2.0

would be appropriate.

Comment 45 Gerd Hoffmann 2023-04-06 11:22:56 UTC

(In reply to Andrea Bolognani from comment #44)
> (In reply to Gerd Hoffmann from comment #42)
> > Related question: What about switching the default from 2M to 4M builds
> > in x86 Fedora (both raw)?
> > 
> > I think for that the vmram size of existing VMs must be checked against
> > the template size of the firmware config.
> 
> Yeah, that one is more tricky so I've intentionally avoided tackling
> it so far.
> 
> How urgent/desirable is it to change the default in this case? What's
> the advantage of using 4M builds instead of 2M builds?

Main advantage is that there is alot more room for efi variables,
and the limited space in the 2M builds becomes increasingly problematic
for secure boot configurations due to the dbx database (signatures of
known-vulnerable binaries) is growing ...

>   Conflicts: libvirt-daemon-driver-qemu < 9.2.0

Ok.

Comment 46 Gerd Hoffmann 2023-04-06 11:23:50 UTC

fedora test builds:
https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/

Comment 47 Andrea Bolognani 2023-04-06 14:02:19 UTC

(In reply to Gerd Hoffmann from comment #45)
> (In reply to Andrea Bolognani from comment #44)
> > How urgent/desirable is it to change the default in this case? What's
> > the advantage of using 4M builds instead of 2M builds?
> 
> Main advantage is that there is alot more room for efi variables,
> and the limited space in the 2M builds becomes increasingly problematic
> for secure boot configurations due to the dbx database (signatures of
> known-vulnerable binaries) is growing ...

Which begs the question: are the qcow2 edk2-aarch64 images 4M?

Actually, I wonder if we could sidestep the need to have explicit
support for differently-sized builds, or at least push it down the
road further, by making 4M qcow2 builds the default across
architectures.

While it's true that the change of format is not as beneficial on
x86_64 as it is on aarch64, it shouldn't make things any worse
either, right? And we could take the opportunity to implicitly roll
out the change in size without further disruption.

What do you think?

Comment 48 Gerd Hoffmann 2023-04-06 14:21:13 UTC

> Which begs the question: are the qcow2 edk2-aarch64 images 4M?

aarch64 efi variable store has exactly the same size the 4M x86_64 builds have.

> While it's true that the change of format is not as beneficial on
> x86_64 as it is on aarch64, it shouldn't make things any worse
> either, right? And we could take the opportunity to implicitly roll
> out the change in size without further disruption.

Hmm.  Not sure I like the idea.  Should work, yes.  But would lead
to fedora and rhel being different (rhel is at 4M already so no
reason to change things there).

Comment 49 Andrea Bolognani 2023-04-06 14:46:19 UTC

(In reply to Gerd Hoffmann from comment #48)
> > While it's true that the change of format is not as beneficial on
> > x86_64 as it is on aarch64, it shouldn't make things any worse
> > either, right? And we could take the opportunity to implicitly roll
> > out the change in size without further disruption.
> 
> Hmm.  Not sure I like the idea.  Should work, yes.  But would lead
> to fedora and rhel being different (rhel is at 4M already so no
> reason to change things there).

If currently RHEL is using 4M raw and Fedora is using 2M raw,
switching both to 4M qcow2 would reduce the differences between them,
not increase them :)

Plus, by doing so we'd keep things consistent across all RHEL and
Fedora architectures.

I get the argument for not changing what's working, but if we're
confident enough to change the default for aarch64 why wouldn't we do
the same for x86_64? IMO we just need to make sure the change is
tested extensively before rolling it out to users.

Comment 50 Andrea Bolognani 2023-04-18 14:20:20 UTC

Keeping the discussion going.

Should we switch all firmware images across both Fedora and RHEL,
and across all the architectures they support, to qcow2 format? And
switch to 4M builds on x86_64 in the process?

This would just be the default for new VMs. Old VMs would keep using
raw images of whatever size.

Are there concrete objections to this idea?

Comment 51 Gerd Hoffmann 2023-04-25 07:02:28 UTC

(In reply to Andrea Bolognani from comment #50)

> Are there concrete objections to this idea?

Doubles the size of the ovmf package.
Larger test matrix.
On x86 we don't have any actual benefits to justify this.

Comment 52 Andrea Bolognani 2023-04-26 16:57:02 UTC

(In reply to Gerd Hoffmann from comment #51)
> Doubles the size of the ovmf package.
> Larger test matrix.
> On x86 we don't have any actual benefits to justify this.

Okay, those are all reasonable as far as RHEL is concerned.

What about Fedora though? There, the advantage would be switching
from 2M build to 4M builds without having to come up with another
mechanism at the JSON file descriptor / libvirt XML level.

Comment 53 Gerd Hoffmann 2023-05-04 14:02:43 UTC

Pimped up virt-firmware a bit so qcow2 image support is more complete.

With that in place switching over should work, test packages:
https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/

current state:
 - 2M builds (edk2/ovmf) stay as-is.
 - 4M builds (edk2/ovmf-4m) are switched to qcow2.
 - json files point to 2M builds only still,
   no json files for the 4M builds yet (that is next on the todo list).
 - not fully sure dropping the 4M raw builds is a good idea.
   They have been around for a while, but have never been referenced
   in *.json files, so a bit unclear how much they are used ...

Comment 54 Andrea Bolognani 2023-05-04 15:07:30 UTC

(In reply to Gerd Hoffmann from comment #53)
> Pimped up virt-firmware a bit so qcow2 image support is more complete.
> 
> With that in place switching over should work, test packages:
> https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/

Excellent progress, thanks!

> current state:
>  - 2M builds (edk2/ovmf) stay as-is.
>  - 4M builds (edk2/ovmf-4m) are switched to qcow2.
>  - json files point to 2M builds only still,
>    no json files for the 4M builds yet (that is next on the todo list).

Do you plan on making the 4M builds the default by giving their JSON
files higher priority?

>  - not fully sure dropping the 4M raw builds is a good idea.
>    They have been around for a while, but have never been referenced
>    in *.json files, so a bit unclear how much they are used ...

People have probably been referencing them directly, so unless
keeping them around is a big maintenance burden or we consider the
extra ~8M to be an unacceptable overhead I would vote for keeping
them around, at least for some time.


Random thought: do we want to keep using the /usr/share/edk2/ovmf-4m
directory for these builds as they supposedly start gaining more use,
or should we move them under /usr/share/edk2/ovmf and differentiate
them from the 2M builds through the filename alone, e.g.
OVMF_CODE.4m.secboot.qcow2? I'm talking about the newly-introduced
ones only, of course.

Comment 55 Andrea Bolognani 2023-05-08 13:44:18 UTC

(In reply to Andrea Bolognani from comment #54)
> Random thought: do we want to keep using the /usr/share/edk2/ovmf-4m
> directory for these builds as they supposedly start gaining more use,
> or should we move them under /usr/share/edk2/ovmf and differentiate
> them from the 2M builds through the filename alone, e.g.
> OVMF_CODE.4m.secboot.qcow2? I'm talking about the newly-introduced
> ones only, of course.

As an additional data point, these are the contents of Debian's ovmf
package[1]:

  /usr/share/OVMF/OVMF_CODE.fd
  /usr/share/OVMF/OVMF_CODE.ms.fd
  /usr/share/OVMF/OVMF_CODE.secboot.fd
  /usr/share/OVMF/OVMF_CODE_4M.fd
  /usr/share/OVMF/OVMF_CODE_4M.ms.fd
  /usr/share/OVMF/OVMF_CODE_4M.secboot.fd
  /usr/share/OVMF/OVMF_CODE_4M.snakeoil.fd
  /usr/share/OVMF/OVMF_VARS.fd
  /usr/share/OVMF/OVMF_VARS.ms.fd
  /usr/share/OVMF/OVMF_VARS_4M.fd
  /usr/share/OVMF/OVMF_VARS_4M.ms.fd
  /usr/share/OVMF/OVMF_VARS_4M.snakeoil.fd
  /usr/share/doc/ovmf/README.Debian
  /usr/share/doc/ovmf/changelog.Debian.gz
  /usr/share/doc/ovmf/copyright
  /usr/share/ovmf/OVMF.fd
  /usr/share/ovmf/PkKek-1-snakeoil.key
  /usr/share/ovmf/PkKek-1-snakeoil.pem
  /usr/share/qemu/OVMF.fd
  /usr/share/qemu/firmware/40-edk2-x86_64-secure-enrolled.json
  /usr/share/qemu/firmware/50-edk2-x86_64-secure.json
  /usr/share/qemu/firmware/60-edk2-x86_64.json

We could adopt the same naming convention.

The JSON files only reference the 4M builds. The other files are, I
assume, only there for backwards compatibility reasons.


[1] https://packages.debian.org/sid/all/ovmf/filelist

Comment 56 Gerd Hoffmann 2023-05-09 10:09:23 UTC

> With that in place switching over should work, test packages:
> https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/

>  - json files point to 2M builds only still,
>    no json files for the 4M builds yet (that is next on the todo list).

Now with json files included.

> > Random thought: do we want to keep using the /usr/share/edk2/ovmf-4m
> > directory for these builds as they supposedly start gaining more use,
> > or should we move them under /usr/share/edk2/ovmf and differentiate
> > them from the 2M builds through the filename alone, e.g.
> > OVMF_CODE.4m.secboot.qcow2? I'm talking about the newly-introduced
> > ones only, of course.

Depends a bit what we are going to do with the raw 4M builds ...

In case we keep them they should also keep the current paths, and in
that case I'd place the the qcow2 images next to them in the ovmf-4m
directory to avoid confusion.

In case we drop them it doesn't make much of a difference whenever we
have the '4m' in the subdirectory name or the file name.

Comment 57 Andrea Bolognani 2023-05-10 08:30:52 UTC

(In reply to Gerd Hoffmann from comment #56)
> > > Random thought: do we want to keep using the /usr/share/edk2/ovmf-4m
> > > directory for these builds as they supposedly start gaining more use,
> > > or should we move them under /usr/share/edk2/ovmf and differentiate
> > > them from the 2M builds through the filename alone, e.g.
> > > OVMF_CODE.4m.secboot.qcow2? I'm talking about the newly-introduced
> > > ones only, of course.
> 
> Depends a bit what we are going to do with the raw 4M builds ...
> 
> In case we keep them they should also keep the current paths, and in
> that case I'd place the the qcow2 images next to them in the ovmf-4m
> directory to avoid confusion.
> 
> In case we drop them it doesn't make much of a difference whenever we
> have the '4m' in the subdirectory name or the file name.

My reasoning is that, even if we keep the raw 4M builds around for
now, it will probably be fine to drop them in a couple of years or
something like that. At that point I think it would be nicer to have
all builds, 2M and 4M, in the /usr/share/edk2/ovmf directory. But
you're right that the difference between ovmf-4m/OVMF_CODE.qcow2 and
ovmf/OVMF_CODE_4M.qcow2 is very minimal. The latter just feels a tiny
bit better as a long-term choice. I leave it up to you :)

Comment 58 Andrea Bolognani 2023-05-10 09:02:05 UTC

I tried the latest test package and everything seems fine :)

Comment 59 Andrea Bolognani 2023-05-15 12:33:25 UTC

Actually, a couple of comments.

The raw and qcow2 descriptors advertise slightly different sets of
features:

  $ diff -Naru /usr/share/qemu/firmware/30-edk2-ovmf-x64-sb-enrolled.json /usr/share/qemu/firmware/928-edk2-ovmf-x64-qcow2-sb-enrolled.json
  --- /usr/share/qemu/firmware/30-edk2-ovmf-x64-sb-enrolled.json	2023-04-29 00:44:47.000000000 +0200
  +++ /usr/share/qemu/firmware/928-edk2-ovmf-x64-qcow2-sb-enrolled.json	2023-01-30 16:14:17.345578065 +0100
  @@ -6,12 +6,12 @@
       "mapping": {
           "device": "flash",
           "executable": {
  -            "filename": "/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd",
  -            "format": "raw"
  +            "filename": "/usr/share/edk2/ovmf/OVMF_CODE.secboot.qcow2",
  +            "format": "qcow2"
           },
           "nvram-template": {
  -            "filename": "/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd",
  -            "format": "raw"
  +            "filename": "/usr/share/edk2/ovmf/OVMF_VARS.secboot.qcow2",
  +            "format": "qcow2"
           }
       },
       "targets": [
  @@ -24,6 +24,7 @@
       ],
       "features": [
           "acpi-s3",
  +        "amd-sev",
           "enrolled-keys",
           "requires-smm",
           "secure-boot",

Is that intentional? I definitely didn't expect it.

Another issue is the naming of the files. Citing the
specification[1]:

  It is recommended to create firmware JSON files (each containing a
  single @Firmware root element) with a *double-digit* prefix, for
  example "50-ovmf.json", "50-seabios-256k.json", etc, so they can be
  sorted in predictable order.

(emphasis mine).

With the current naming, 93-foo.json would sort *after* the qcow2
builds, which would be quite surprising IMO.


[1] https://gitlab.com/qemu-project/qemu/-/blob/8844bb8d896595ee1d25d21c770e6e6f29803097/docs/interop/firmware.json#L361-364

Comment 60 Gerd Hoffmann 2023-05-16 06:02:32 UTC

>   +++ /usr/share/qemu/firmware/928-edk2-ovmf-x64-qcow2-sb-enrolled.json

Huh?  Where does this file come from?

# rpm -ql edk2-ovmf | grep json
/usr/share/qemu/firmware/30-edk2-ovmf-4m-qcow2-x64-sb-enrolled.json
/usr/share/qemu/firmware/31-edk2-ovmf-2m-raw-x64-sb-enrolled.json
/usr/share/qemu/firmware/40-edk2-ovmf-4m-qcow2-x64-sb.json
/usr/share/qemu/firmware/41-edk2-ovmf-2m-raw-x64-sb.json
/usr/share/qemu/firmware/50-edk2-ovmf-4m-qcow2-x64-nosb.json
/usr/share/qemu/firmware/50-edk2-ovmf-x64-microvm.json
/usr/share/qemu/firmware/51-edk2-ovmf-2m-raw-x64-nosb.json
/usr/share/qemu/firmware/60-edk2-ovmf-x64-amdsev.json
/usr/share/qemu/firmware/60-edk2-ovmf-x64-inteltdx.json

(must be on rawhide or virt-preview to see this,
plain f37/f38 are not updated b/c libvirt is too old).

Comment 61 Andrea Bolognani 2023-05-16 09:40:36 UTC

(In reply to Gerd Hoffmann from comment #60)
> >   +++ /usr/share/qemu/firmware/928-edk2-ovmf-x64-qcow2-sb-enrolled.json
> 
> Huh?  Where does this file come from?
> 
> # rpm -ql edk2-ovmf | grep json
> /usr/share/qemu/firmware/30-edk2-ovmf-4m-qcow2-x64-sb-enrolled.json
> /usr/share/qemu/firmware/31-edk2-ovmf-2m-raw-x64-sb-enrolled.json
> /usr/share/qemu/firmware/40-edk2-ovmf-4m-qcow2-x64-sb.json
> /usr/share/qemu/firmware/41-edk2-ovmf-2m-raw-x64-sb.json
> /usr/share/qemu/firmware/50-edk2-ovmf-4m-qcow2-x64-nosb.json
> /usr/share/qemu/firmware/50-edk2-ovmf-x64-microvm.json
> /usr/share/qemu/firmware/51-edk2-ovmf-2m-raw-x64-nosb.json
> /usr/share/qemu/firmware/60-edk2-ovmf-x64-amdsev.json
> /usr/share/qemu/firmware/60-edk2-ovmf-x64-inteltdx.json

How embarrassing! It's clearly a leftover from earlier tests .-.

Usually I include my username in the file names to make sure that I
don't mix them up with those coming from packages, but I obviously
failed to do so this time around O:-)

Apologies for the noise!

> (must be on rawhide or virt-preview to see this,
> plain f37/f38 are not updated b/c libvirt is too old).

AFAICT the latest version available from virt-preview is
20230301gitf80f052277c8-4, which doesn't include descriptors for the
qcow2 builds. I've downloaded the -28 build from koji instead, works
fine :)

I see you've dropped the 4M raw builds after all. Can we move the 4M
qcow2 builds into /usr/share/edk2/ovmf and differentiate them from
the 2M builds using a _4M suffix in the filename then? Following the
schema established by Debian.

Another minor thing: the microvm descriptor being numbered at 50-
doesn't make a lot of sense to me. I'd put it either at 60-, with
other special-purpose builds, or possibly even at 70-, since it
targets a completely different machine type than anything that comes
before it.

Comment 62 Andrea Bolognani 2023-05-16 12:41:34 UTC

One more request. Would it be possible to explicitly include

  "mapping": {
      "device": "flash",
      "mode": "split",

in the descriptor files? The spec says that "split" is the default
mode, so it wouldn't change anything for spec-compliant consumers,
but having that information in the JSON file would make things easier
for the libvirt test suite.

Thanks!

Comment 63 Gerd Hoffmann 2023-05-16 12:43:06 UTC

> > (must be on rawhide or virt-preview to see this,
> > plain f37/f38 are not updated b/c libvirt is too old).
> 
> AFAICT the latest version available from virt-preview is
> 20230301gitf80f052277c8-4, which doesn't include descriptors for the
> qcow2 builds.

looking.  Indeed.  Seems to not be automatic.  /me kicks a build.

> I've downloaded the -28 build from koji instead, works
> fine :)

Good.

> I see you've dropped the 4M raw builds after all.

Yes.  Testing waters.  When people complain it is a single distgit
revert to restore them.  No complains yet (but it probably also not
widely installed yet ...).

> Another minor thing: the microvm descriptor being numbered at 50-
> doesn't make a lot of sense to me. I'd put it either at 60-, with
> other special-purpose builds, or possibly even at 70-, since it
> targets a completely different machine type than anything that comes
> before it.

IMHO it fits perfectly.  It's a build without secure boot support
like the other 5x-*.json builds.  And it shouldn't conflict with
anything as it is the only build for the 'microvm' machine type.

Comment 64 Gerd Hoffmann 2023-05-16 12:52:42 UTC

(In reply to Andrea Bolognani from comment #62)
> One more request. Would it be possible to explicitly include
> 
>   "mapping": {
>       "device": "flash",
>       "mode": "split",
> 
> in the descriptor files?

https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/build/5923246/

Comment 65 Andrea Bolognani 2023-05-16 14:19:37 UTC

> > One more request. Would it be possible to explicitly include
> > 
> >   "mapping": {
> >       "device": "flash",
> >       "mode": "split",
> > 
> > in the descriptor files?
> 
> https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/build/5923246/

Excellent, thanks!

> > Another minor thing: the microvm descriptor being numbered at 50-
> > doesn't make a lot of sense to me. I'd put it either at 60-, with
> > other special-purpose builds, or possibly even at 70-, since it
> > targets a completely different machine type than anything that comes
> > before it.
> 
> IMHO it fits perfectly.  It's a build without secure boot support
> like the other 5x-*.json builds.  And it shouldn't conflict with
> anything as it is the only build for the 'microvm' machine type.

I see where you're coming from. It doesn't really change things one
way or the other on a functional level, so I have no issues with
leaving things as they currently are.

> > I see you've dropped the 4M raw builds after all.
> 
> Yes.  Testing waters.  When people complain it is a single distgit
> revert to restore them.  No complains yet (but it probably also not
> widely installed yet ...).

Yeah, I think that's fair. We can probably even push back a little
bit if asked to reintroduce them, seeing as the qcow2 builds are 4M
so it should be feasible to switch over to them.

> > Can we move the 4M
> > qcow2 builds into /usr/share/edk2/ovmf and differentiate them from
> > the 2M builds using a _4M suffix in the filename then? Following the
> > schema established by Debian.

What about this? If we expect that the 4M raw builds are not likely
to be reintroduced, it would be a real shame to keep the files split
across two directories instead of consolidated into a single one...

Comment 66 Gerd Hoffmann 2023-05-17 11:26:57 UTC

https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/build/5927464/

Comment 67 Andrea Bolognani 2023-05-18 11:52:26 UTC

(In reply to Gerd Hoffmann from comment #66)
> https://copr.fedorainfracloud.org/coprs/kraxel/edk2.testbuilds/build/5927464/

Looks excellent, works perfectly. No notes :) Thanks a lot!

Comment 68 Gerd Hoffmann 2023-07-11 08:27:22 UTC

Anything left to do here? Or can we close/nextrelease?

Comment 69 Andrea Bolognani 2023-07-13 16:46:57 UTC

(In reply to Gerd Hoffmann from comment #68)
> Anything left to do here?

Not that I can think of.

> Or can we close/nextrelease?

Sounds good.

Thanks again for all your work on this!

Comment 70 Gerd Hoffmann 2023-07-20 10:26:45 UTC

> > Or can we close/nextrelease?
> Sounds good.
Done.

Note You need to log in before you can comment on or make changes to this bug.