Bug 2076224 - virt-install ends up with ' failed to find romfile "vgabios-virtio.bin" ' error on aarch64 host
Summary: virt-install ends up with ' failed to find romfile "vgabios-virtio.bin" ' er...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 36
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-18 11:54 UTC by lnie
Modified: 2022-11-08 18:55 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-11-08 18:55:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
--debug output (13.83 KB, text/plain)
2022-04-20 07:47 UTC, lnie
no flags Details
/var/log/libvirt/qemu/$VMNAME.log (6.21 KB, text/plain)
2022-08-02 10:03 UTC, lnie
no flags Details
proposed qemu.spec patch (for Fedora 35, but should apply to Rawhide too) (2.69 KB, patch)
2022-11-04 15:02 UTC, Laszlo Ersek
no flags Details | Diff

Description lnie 2022-04-18 11:54:33 UTC
Description of problem:
virt-install failed with the following error:

ERROR    internal error: process exited while connecting to monitor: 2022-04-18T10:07:43.025651Z qemu-system-aarch64: -device virtio-vga,id=video0,max_outputs=1,bus=pci.8,addr=0x0: failed to find romfile "vgabios-virtio.bin

Version-Release number of selected component (if applicable):
virt-install-4.0.0-1.fc36.noarch
qemu-system-aarch64-6.2.0-8.fc36.aarch64

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Daniel Berrangé 2022-04-19 11:00:07 UTC
Hmm, curious, I didn't think that virtio-vga even existed for aarch64 guests - i thought virtio-gpu was required. ie no VGA compat

Can you share your full virt-install command line, and ideally run it with '--debug' and attasch the output too.

Comment 2 lnie 2022-04-20 07:47:05 UTC
virt-install --connect qemu:///system   --name test --ram=1224 --vcpus=2 --location=/var/lib/libvirt/images/Fedora-Server-dvd-aarch64-36-20220415.n.0.iso --disk path=/var/lib/libvirt/images/testthis2.qcow2,size=15,format=qcow2 --network network=default,model=virtio --virt-type=kvm --video=virtio --noautoconsole --extra-args="inst.ks=https://lnie.fedorapeople.org/test.ks"

I checked after see your comment, virt-install works well after I remove --video=virtio.
I guess it is because I'm running on aarch64 server host?
I also checked on x86_64 server host, the error messsage I got is:ERROR  unsupported configuration: domain configuration does not support video model 'virtio',which I think is more clear.

Comment 3 lnie 2022-04-20 07:47:54 UTC
Created attachment 1873726 [details]
--debug output

Comment 4 Cole Robinson 2022-06-17 17:18:22 UTC
Can you do a full system update on the aarch64 host, then show

rpm -qa | grep qemu
sudo virsh domcapabilities --arch aarch64 --machine virt

Comment 5 lnie 2022-06-23 04:01:05 UTC
hi,

Here is the output from a fully updated server:

[root@ampere-hr350a-04 ~]# rpm -qa | grep qemu
ipxe-roms-qemu-20200823-8.git4bd064de.fc36.noarch
libvirt-daemon-driver-qemu-8.1.0-2.fc36.aarch64
qemu-common-6.2.0-12.fc36.aarch64
qemu-system-aarch64-core-6.2.0-12.fc36.aarch64
qemu-device-display-vhost-user-gpu-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-gpu-gl-6.2.0-12.fc36.aarch64
qemu-audio-alsa-6.2.0-12.fc36.aarch64
qemu-audio-jack-6.2.0-12.fc36.aarch64
qemu-audio-oss-6.2.0-12.fc36.aarch64
qemu-audio-pa-6.2.0-12.fc36.aarch64
qemu-block-curl-6.2.0-12.fc36.aarch64
qemu-block-dmg-6.2.0-12.fc36.aarch64
qemu-block-gluster-6.2.0-12.fc36.aarch64
qemu-block-iscsi-6.2.0-12.fc36.aarch64
qemu-block-rbd-6.2.0-12.fc36.aarch64
qemu-block-ssh-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-gpu-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-gpu-pci-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-gpu-pci-gl-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-vga-6.2.0-12.fc36.aarch64
qemu-device-display-virtio-vga-gl-6.2.0-12.fc36.aarch64
qemu-device-usb-host-6.2.0-12.fc36.aarch64
qemu-ui-curses-6.2.0-12.fc36.aarch64
qemu-ui-opengl-6.2.0-12.fc36.aarch64
qemu-ui-spice-core-6.2.0-12.fc36.aarch64
qemu-char-spice-6.2.0-12.fc36.aarch64
qemu-ui-spice-app-6.2.0-12.fc36.aarch64
qemu-audio-spice-6.2.0-12.fc36.aarch64
qemu-device-display-qxl-6.2.0-12.fc36.aarch64
qemu-ui-egl-headless-6.2.0-12.fc36.aarch64
qemu-virtiofsd-6.2.0-12.fc36.aarch64
qemu-char-baum-6.2.0-12.fc36.aarch64
qemu-ui-gtk-6.2.0-12.fc36.aarch64
qemu-device-usb-redirect-6.2.0-12.fc36.aarch64
qemu-block-nfs-6.2.0-12.fc36.aarch64
qemu-ui-sdl-6.2.0-12.fc36.aarch64
qemu-audio-sdl-6.2.0-12.fc36.aarch64
qemu-device-usb-smartcard-6.2.0-12.fc36.aarch64
qemu-pr-helper-6.2.0-12.fc36.aarch64
qemu-system-aarch64-6.2.0-12.fc36.aarch64
qemu-kvm-6.2.0-12.fc36.aarch64
qemu-kvm-core-6.2.0-12.fc36.aarch64
qemu-img-6.2.0-12.fc36.aarch64
[root@ampere-hr350a-04 ~]# sudo virsh domcapabilities --arch aarch64 --machine virt
<domainCapabilities>
  <path>/usr/bin/qemu-system-aarch64</path>
  <domain>kvm</domain>
  <machine>virt-6.2</machine>
  <arch>aarch64</arch>
  <vcpu max='512'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'>
      <value>efi</value>
    </enum>
    <loader supported='yes'>
      <value>/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw</value>
      <value>/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>off</value>
      </enum>
    </mode>
    <mode name='maximum' supported='yes'>
      <enum name='maximumMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='no'/>
    <mode name='custom' supported='yes'>
      <model usable='unknown'>pxa270-c0</model>
      <model usable='unknown'>cortex-a15</model>
      <model usable='unknown'>pxa270-b0</model>
      <model usable='unknown'>cortex-a57</model>
      <model usable='unknown'>cortex-m4</model>
      <model usable='unknown'>pxa270-a0</model>
      <model usable='unknown'>arm1176</model>
      <model usable='unknown'>pxa270-b1</model>
      <model usable='unknown'>cortex-a7</model>
      <model usable='unknown'>pxa270-a1</model>
      <model usable='unknown'>a64fx</model>
      <model usable='unknown'>cortex-a8</model>
      <model usable='unknown'>cortex-r5</model>
      <model usable='unknown'>ti925t</model>
      <model usable='unknown'>cortex-r5f</model>
      <model usable='unknown'>arm1026</model>
      <model usable='unknown'>cortex-a9</model>
      <model usable='unknown'>cortex-m7</model>
      <model usable='unknown'>pxa270</model>
      <model usable='unknown'>pxa260</model>
      <model usable='unknown'>pxa250</model>
      <model usable='unknown'>pxa270-c5</model>
      <model usable='unknown'>pxa261</model>
      <model usable='unknown'>pxa262</model>
      <model usable='unknown'>sa1110</model>
      <model usable='unknown'>sa1100</model>
      <model usable='unknown'>max</model>
      <model usable='unknown'>cortex-m0</model>
      <model usable='unknown'>cortex-a53</model>
      <model usable='unknown'>cortex-m33</model>
      <model usable='unknown'>cortex-a72</model>
      <model usable='unknown'>arm946</model>
      <model usable='unknown'>pxa255</model>
      <model usable='unknown'>arm11mpcore</model>
      <model usable='unknown'>cortex-m55</model>
      <model usable='unknown'>arm926</model>
      <model usable='unknown'>arm1136</model>
      <model usable='unknown'>arm1136-r2</model>
      <model usable='unknown'>cortex-m3</model>
    </mode>
  </cpu>
  <memoryBacking supported='yes'>
    <enum name='sourceType'>
      <value>file</value>
      <value>anonymous</value>
      <value>memfd</value>
    </enum>
  </memoryBacking>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>sdl</value>
        <value>vnc</value>
        <value>spice</value>
        <value>egl-headless</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>vmvga</value>
        <value>qxl</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'>
        <value>default</value>
        <value>vfio</value>
      </enum>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
        <value>builtin</value>
      </enum>
    </rng>
    <filesystem supported='yes'>
      <enum name='driverType'>
        <value>path</value>
        <value>handle</value>
        <value>virtiofs</value>
      </enum>
    </filesystem>
    <tpm supported='yes'>
      <enum name='model'>
        <value>tpm-tis</value>
      </enum>
      <enum name='backendModel'>
        <value>passthrough</value>
        <value>emulator</value>
      </enum>
    </tpm>
  </devices>
  <features>
    <gic supported='yes'>
      <enum name='version'>
        <value>3</value>
      </enum>
    </gic>
    <vmcoreinfo supported='yes'/>
    <genid supported='no'/>
    <backingStoreInput supported='yes'/>
    <backup supported='yes'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>

Comment 6 Cole Robinson 2022-08-01 14:01:14 UTC
I don't have access to an aarch64 host, but on x86_64, creating aarch64 with --virtio video works as expected for me. I'm confused about what's going on. Can you do two more things:

1) attach /var/log/libvirt/qemu/$VMNAME.log for the aarch64 on aarch64 failure case

2) grab latest everything from virt-preview, and see if it reproduces:
sudo dnf copr enable @virtmaint-sig/virt-preview
sudo dnf update

Comment 7 lnie 2022-08-02 10:01:42 UTC
Sorry, it seems that the command param I was using on aarch64 server is --cdrom instead of --location.There is no isolinux path on aarch64 server iso,you will see "Couldn't find kernel for install tree"error with --location.

I still see this bug with the copr update,you could reserve one beaker aarch64 sever,or please ping me if you want to use my beaker server.

Comment 8 lnie 2022-08-02 10:03:33 UTC
Created attachment 1902819 [details]
/var/log/libvirt/qemu/$VMNAME.log

Comment 9 Cole Robinson 2022-08-03 12:23:57 UTC
lnie gave me access to a machine that I reproduced on. libvirt is erroneously putting virtio-vga on the commandline when it should be virtio-gpu, but the question is why.

turns out this is happening in fedora aarch64-on-x86 too, but it doesn't error, because I have seavgabios installed too so the vgabios rom isn't missing.

what is supposed to be handling this is this block in qemu_command.c:qemuDeviceVideoGetModel

    bool primaryVga = false;
    ...
    if (video->primary && qemuDomainSupportsVideoVga(video, qemuCaps))           
        primaryVga = true;

If the <video> device isn't primary, then virtio-gpu is used. qemuDomainSupportsVideoVga should be returning 'false' for our aarch64 case. Originally this code would check for -M virt usage, but now it just checks for caps flags:

bool                                                                                                
qemuDomainSupportsVideoVga(const virDomainVideoDef *video,                       
                           virQEMUCaps *qemuCaps)                                
{                                                                                                   
    if (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO) {                           
        if (video->backend == VIR_DOMAIN_VIDEO_BACKEND_TYPE_VHOSTUSER) {         
            if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_VGA))      
                return false;                                                                       
        } else if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_VGA)) {     
            return false;                                                                           
        }                                                                                           
    }                                                                                               
                                                                                                    
    return true;                                                                                    
}


So either that check is not sufficient, or qemu is erroneously advertising VIRTIO_VGA in a case that it shouldn't.

@abologna thoughts?

Comment 10 Andrea Bolognani 2022-08-04 13:50:37 UTC
(In reply to Cole Robinson from comment #9)
> lnie gave me access to a machine that I reproduced on. libvirt is
> erroneously putting virtio-vga on the commandline when it should be
> virtio-gpu, but the question is why.
> 
> turns out this is happening in fedora aarch64-on-x86 too, but it doesn't
> error, because I have seavgabios installed too so the vgabios rom isn't
> missing.
> 
> what is supposed to be handling this is this block in
> qemu_command.c:qemuDeviceVideoGetModel
> 
>     bool primaryVga = false;
>     ...
>     if (video->primary && qemuDomainSupportsVideoVga(video, qemuCaps))      
> 
>         primaryVga = true;
> 
> If the <video> device isn't primary, then virtio-gpu is used.
> qemuDomainSupportsVideoVga should be returning 'false' for our aarch64 case.
> Originally this code would check for -M virt usage, but now it just checks
> for caps flags:
> 
> bool                                                                        
> 
> qemuDomainSupportsVideoVga(const virDomainVideoDef *video,                  
> 
>                            virQEMUCaps *qemuCaps)                           
> 
> {                                                                           
> 
>     if (video->type == VIR_DOMAIN_VIDEO_TYPE_VIRTIO) {                      
> 
>         if (video->backend == VIR_DOMAIN_VIDEO_BACKEND_TYPE_VHOSTUSER) {    
> 
>             if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VHOST_USER_VGA)) 
> 
>                 return false;                                               
> 
>         } else if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_VGA)) {
> 
>             return false;                                                   
> 
>         }                                                                   
> 
>     }                                                                       
> 
>                                                                             
> 
>     return true;                                                            
> 
> }
> 
> 
> So either that check is not sufficient, or qemu is erroneously advertising
> VIRTIO_VGA in a case that it shouldn't.
> 
> @abologna thoughts?


The change from singling out a specific machine type to going off
QEMU capabilities feels like a good one.

What is surprising is that the capability would be detected for a
build of qemu-system-aarch64 when we know that virtio-vga can't work
correctly on that architecture.

Using the Fedora 36 package:

  $ qemu-system-aarch64 -device help 2>&1 | grep virtio-vga
  name "virtio-vga", bus PCI
  name "virtio-vga-gl", bus PCI
  $

Building QEMU from source:

  $ ./aarch64-softmmu/qemu-system-aarch64 -device help 2>&1 | grep virtio-vga
  $

So it would appear that upstream QEMU correctly defaults to not
enabling virtio-vga for qemu-system-aarch64, but the Fedora packaging
goes out of its way to alter this default.

Overall I think libvirt is doing the right thing, and what needs to
change is

  1) Fedora should respect the upstream default instead of explicitly
     enabling virtio-vga when building qemu-system-aarch64;

  2) QEMU should start rejecting attempts to enable virtio-vga when
     building qemu-system-aarch64.

Note that we're talking about virtio-vga specifically here but I see
a number of other VGA devices that are enabled both upstream and on
Fedora, and the same reasoning would probably apply to those too.

Comment 11 Andrea Bolognani 2022-08-04 16:19:42 UTC
(In reply to Andrea Bolognani from comment #10)
> Overall I think libvirt is doing the right thing, and what needs to
> change is
> 
>   1) Fedora should respect the upstream default instead of explicitly
>      enabling virtio-vga when building qemu-system-aarch64;
> 
>   2) QEMU should start rejecting attempts to enable virtio-vga when
>      building qemu-system-aarch64.
> 
> Note that we're talking about virtio-vga specifically here but I see
> a number of other VGA devices that are enabled both upstream and on
> Fedora, and the same reasoning would probably apply to those too.

After thinking about this a bit more, I feel that it's worth pointing
out that my aarch64 work has been fairly narrowly focused on the
enterprise virtualization use case, which implies using the virt
machine type and booting via UEFI.

I'm pretty sure I remember the issues with VGA on aarch64 were caused
by some inherent characteristic of the architecture (something to do
with memory barriers perhaps?), but it's also possible that such
issues are not present when using a different machine type or
avoiding UEFI.

In that case, we would indeed have to look into reintroducing some
variation of the original machine type check into libvirt.

Comment 12 Laszlo Ersek 2022-08-15 12:34:57 UTC
706b5b627719 ("qemu: map "virtio" video model to "virt" machtype correctly (arm/aarch64)", 2016-09-16), by yours truly
4c029e8cfa33 ("qemu_command: properly detect which model to use for video device", 2016-10-12), by Pavel

Note that the commit dates of these commits are extremely close to each other, the first one is part of v2.3.0, the latter one (which I have not known of until today!) is in v2.4.0. So if commit 4c029e8cfa33 itself had caused problems, we'd have seen that ages ago.

The problem on aarch64/KVM was (and, to my knowledge, remains) that aarch64 combines host-side and guest-side memory caching attributes differently from x86 (and everything else, effectively). The guest marks VGA MMIO regions as uncacheable (which is fine) and QEMU/KVM mark the backing host RAM pages as cacheable (which is also fine). On x86, the result is that the host-side attribute takes effect in both guest and host, so the guest's VGA MMIO writes land in the physical CPU cache first, and *then* go to phys RAM, and the QEMU process from the host side follows the same path, so things "meet" in the PCPU data cache. On aarch64 however, the strictest mapping (here: the guest's) prevails, so the guest's VGA MMIO writes ("uncacheable") go directly to phys RAM, whereas QEMU's RAM reads are first served from the -- now stale -- PCPU cache. What you get is display corruption.

(I hope I mostly remember the problem correctly.)

I vaguely remember that some ARMv8.x extension was underway (part of ARMv8.4?) that would enable an *x86-like combination* of host-side and guest-side caching attributes. With that, virtio-vga-pci would work on aarch64/KVM without issues.

Controlling this from qemu's capability list may not be the best idea. The problem is accelerator-dependent; aarch64/TCG has no such issues.

BTW I can't easily figure out where downstream (Fedora Rawhide, as of 34254733fe7d) enables virtio-vga for aarch64. Upstream has:

hw/hppa/Kconfig:    imply VIRTIO_VGA
hw/i386/Kconfig:    imply VIRTIO_VGA
hw/loongarch/Kconfig:    imply VIRTIO_VGA
hw/mips/Kconfig:    imply VIRTIO_VGA
hw/ppc/Kconfig:    imply VIRTIO_VGA
hw/riscv/Kconfig:    imply VIRTIO_VGA

but I can see no "overrides" downstream. (I may be overlooking a dependency; i.e. some --enable option enabling VIRTIO_VGA for qemu-system-aarch64 indirectly.)

All I can see is that the "system-aarch64" subpackage "requires_all_modules", and that "requires_all_modules" depends on "requires_device_display_virtio_vga". But that only seems to express a runtime dependency on the device-display-virtio-vga subpackage. And I don't think this whole decision should depend on whether a subpackage different from "system-aarch64" is installed, or not, on the system. Even if the dependency is removed from "system-aarch64", the user could still manually install it -- and that should be of no consequence, regarding the display device model chosen for "virtio".

Comment 13 Cole Robinson 2022-08-15 19:45:05 UTC
Thanks for the digging Laszlo! I suspected it is something to do the module packaging. I know there's long standing issue with modules and arch handling. 

For example if you are on x86 host with every device module installed, and run qemu-system-s390x in certain circumstances, it will attempt to load modules that won't even link, and spew warnings to stderr. Latest example I hit was here: https://lore.kernel.org/all/bf53b02c-df25-728a-08c0-826337bb8594@redhat.com/T/

So, could be that qemu aarch64 is reporting 'yes virtio-vga is supported' because it finds and correctly loads the virtio-vga module, even though the build config never intended for that device to be available.

Comment 14 Andrea Bolognani 2022-08-22 15:10:23 UTC
(In reply to Laszlo Ersek from comment #12)
> 706b5b627719 ("qemu: map "virtio" video model to "virt" machtype correctly
> (arm/aarch64)", 2016-09-16), by yours truly
> 4c029e8cfa33 ("qemu_command: properly detect which model to use for video
> device", 2016-10-12), by Pavel
> 
> Note that the commit dates of these commits are extremely close to each
> other, the first one is part of v2.3.0, the latter one (which I have not
> known of until today!) is in v2.4.0. So if commit 4c029e8cfa33 itself had
> caused problems, we'd have seen that ages ago.

To be fair, I expect the number of users actually using graphics with
aarch64 VMs to be quite low, so lack of bug reports doesn't
necessarily imply lack of bugs :)

> Controlling this from qemu's capability list may not be the best idea. The
> problem is accelerator-dependent; aarch64/TCG has no such issues.

If that's the case, then the logic in libvirt should be updated to be
more fine-grained.

That said, we should probably still default to virtio-gpu for virt
VMs and avoid building virtio-vga support into QEMU on aarch64,
otherwise simply switching accelerators will lead to very different
behaviors and that's bound to confuse users.

Comment 15 Laszlo Ersek 2022-11-04 13:16:39 UTC
Confirmed -- it's the invalid runtime dependency in the (otherwise empty) qemu-system-aarch64 package:

# rpm -ql qemu-system-aarch64
(contains no files)

# rpm -q --requires qemu-system-aarch64
[...]
qemu-device-display-virtio-vga
qemu-device-display-virtio-vga-gl
[...]

# rpm -ql qemu-device-display-virtio-vga
[...]
/usr/lib64/qemu/hw-display-virtio-vga.so

# rpm -ql qemu-device-display-virtio-vga-gl
[...]
/usr/lib64/qemu/hw-display-virtio-vga-gl.so

# qemu-system-aarch64 -device help 2>&1 | grep virtio-vga
name "virtio-vga", bus PCI
name "virtio-vga-gl", bus PCI

# rpm --erase --nodeps qemu-device-display-virtio-vga qemu-device-display-virtio-vga-gl

# qemu-system-aarch64 -device help 2>&1 | grep virtio-vga
[nothing]

And then libvirtd generates the correct QEMU command line as well:

[...]
-device virtio-gpu-pci,id=video0,max_outputs=1,bus=pci.4,addr=0x0 \
[...]


I bumped into this today because Pawel and myself were discussing my RPi4's configuration as an aarch64 KVM (edk2) development host, running Fedora 35, and when I tried to boot a preexistent guest via libvirt, the same error popped up.

As to the fix... while exposing virtio-vga on aarch64 is not particularly useful (and so it should not be a "runtime requirement" at the least), once we do make it available as a subpackage for installation, it should certainly not break domains. IOW, libvirtd's logic

  if virtio-vga is advertized, use it over virtio-gpu-pci

works *only as long* as we *do not enable* (let alone require) the user to trivially cause virtio-vga to be advertized (= by installing a readily offered subpackage).

That is, not only should we remove the runtime requirement of qemu-device-display-virtio-vga[-gl] on aarch64, but we should also *stop building* qemu-device-display-virtio-vga[-gl] on aarch64. Even with the runtime dependency removed, these subpackages remain a *total footgun* for the user, given how libvirt favors virtio-vga over virtio-gpu-pci with disregard to the architecture. *Something* must consider the guest architecture here, given that the VGA situation is broken on aarch64 only.

My suggestion is to stop building (or otherwise including in the rpmbuild output) the display-virtio-vga[-gl] subpackages.

Comment 16 Daniel Berrangé 2022-11-04 14:33:30 UTC
(In reply to Laszlo Ersek from comment #15)
> That is, not only should we remove the runtime requirement of
> qemu-device-display-virtio-vga[-gl] on aarch64, but we should also *stop
> building* qemu-device-display-virtio-vga[-gl] on aarch64. Even with the
> runtime dependency removed, these subpackages remain a *total footgun* for
> the user, given how libvirt favors virtio-vga over virtio-gpu-pci with
> disregard to the architecture. *Something* must consider the guest
> architecture here, given that the VGA situation is broken on aarch64 only.
>
> My suggestion is to stop building (or otherwise including in the rpmbuild
> output) the display-virtio-vga[-gl] subpackages.

Disabling build of virtio-vga based on host arch is wrong, as the relevant dependency is related to guest arch.

AFAIK, upstream QEMU has integrated support for filtering modules based on guest target

$ ./configure  --target-list=x86_64-softmmu,aarch64-softmmu --enable-modules
$ make -j 8
$ ./build/qemu-system-x86_64 -device virtio-vga -vnc :1
...running...
$ ./build/qemu-system-aarch64 -device virtio-vga -vnc :1 -M virt
qemu-system-aarch64: -device virtio-vga: 'virtio-vga' is not a valid device model name

I believe this is 

commit 05d6814c3eb16524e992bb7048d3385f8e99dd6a
Author: Jose R. Ziviani <jziviani>
Date:   Sat May 28 00:20:35 2022 +0200

    modules: generates per-target modinfo
    
    This patch changes the way modinfo is generated and built. Instead of
    one modinfo.c it generates one modinfo-<target>-softmmu.c per target. It
    aims a fine-tune control of modules by configuring Kconfig.
    
    Signed-off-by: Jose R. Ziviani <jziviani>
    Signed-off-by: Dario Faggioli <dfaggioli>
    Message-Id: <165369003038.5857.13084289285185196779.stgit@work>
    Signed-off-by: Paolo Bonzini <pbonzini>


We can see effects of that here:

$ diff -u modinfo-x86_64-softmmu.c  modinfo-aarch64-softmmu.c 
--- modinfo-x86_64-softmmu.c	2022-11-04 10:23:24.992111395 -0400
+++ modinfo-aarch64-softmmu.c	2022-11-04 10:23:24.979111485 -0400
@@ -35,10 +35,8 @@
     .deps = ((const char*[]){  "ui-spice-core",  "chardev-spice", NULL }),
 },{
     /* hw-display-qxl.modinfo */
-    .name = "hw-display-qxl",
-    .objs = ((const char*[]){  "qxl-vga",  "qxl", NULL }),
-    .deps = ((const char*[]){  "ui-spice-core", NULL }),
-},{
+    /* module QXL isn't enabled in Kconfig. */
+/* },{ */
     /* hw-display-virtio-gpu.modinfo */
     .name = "hw-display-virtio-gpu",
     .objs = ((const char*[]){  "virtio-gpu-base",  "virtio-gpu-device",  "vhost-user-gpu", NULL }),
@@ -58,14 +56,11 @@
     .deps = ((const char*[]){  "hw-display-virtio-gpu-pci", NULL }),
 },{
     /* hw-display-virtio-vga.modinfo */
-    .name = "hw-display-virtio-vga",
-    .objs = ((const char*[]){  "virtio-vga-base",  "virtio-vga",  "vhost-user-vga", NULL }),
-},{
+    /* module VIRTIO_VGA isn't enabled in Kconfig. */
+/* },{ */
     /* hw-display-virtio-vga-gl.modinfo */
-    .name = "hw-display-virtio-vga-gl",
-    .objs = ((const char*[]){  "virtio-vga-gl", NULL }),
-    .deps = ((const char*[]){  "hw-display-virtio-vga", NULL }),
-},{
+    /* module VIRTIO_VGA isn't enabled in Kconfig. */
+/* },{ */
     /* hw-usb-smartcard.modinfo */
     .name = "hw-usb-smartcard",
     .objs = ((const char*[]){  "ccid-card-emulated",  "ccid-card-passthru", NULL }),

Comment 17 Laszlo Ersek 2022-11-04 15:02:57 UTC
Created attachment 1922220 [details]
proposed qemu.spec patch (for Fedora 35, but should apply to Rawhide too)

(I didn't modify the "configure" options; nothing seemed appropriate there. So the virtio-vga* drivers may still be built on aarch64 (I don't know how to prevent that from happening), but we don't package them.)

Scratch build (F35):
https://koji.fedoraproject.org/koji/taskinfo?taskID=93784166

... unfortunately this build failed; the i686 job broke with the following strange error:

RPM build errors:
Could not access KVM kernel module: No such file or directory
Could not access KVM kernel module: No such file or directory
Could not access KVM kernel module: No such file or directory
Could not access KVM kernel module: No such file or directory

Comment 18 Laszlo Ersek 2022-11-04 15:05:59 UTC
Oops, missed Dan's comment 16. Yes, I confused myself with "%ifarch" -- that's the host/build architecture, not the target architecture. :( sorry about the noise.

Comment 19 Laszlo Ersek 2022-11-04 15:23:21 UTC
(To be precise the problem depends on both guest and host arches -- both need to be arm or aarch64, *and* hardware virtualization (KVM) must be in use. Virtio-vga should work when using TCG accel with qemu-system-aarch64, on both aarch64 and (say) x86_64 hosts.

How "useful" virtio-vga is, with TCG accel, is a different question of course.

So we can simplify and just disable virtio-vga based on either host arch, or guest arch. If we disable virtio-vga based on host arch, then virtio-vga will be absent from qemu-system-aarch64 *plus* all other target emulators on aarch64 hosts, which is likely not good. If we disable virtio-vga based on guest arch, then qemu-system-aarch64 will lack the device model on all other host arches too (not just aarch64), for example on x86_64 hosts. That's a bit of an overkill (in that setup, only TCG is available, and virtio-vga works OK with TCG), but probably not too relevant -- users can *still* use virtio-gpu-pci.)

Comment 20 Daniel Berrangé 2022-11-04 16:18:21 UTC
(In reply to Laszlo Ersek from comment #19)
> So we can simplify and just disable virtio-vga based on either host arch, or
> guest arch. If we disable virtio-vga based on host arch, then virtio-vga
> will be absent from qemu-system-aarch64 *plus* all other target emulators on
> aarch64 hosts, which is likely not good. If we disable virtio-vga based on
> guest arch, then qemu-system-aarch64 will lack the device model on all other
> host arches too (not just aarch64), for example on x86_64 hosts. That's a
> bit of an overkill (in that setup, only TCG is available, and virtio-vga
> works OK with TCG), but probably not too relevant -- users can *still* use
> virtio-gpu-pci.)

QEMU KConfig configuration policy has *always* disabled virtio-vga on qemu-system-aarch64, regardless of host arch. 

The bug here is simply that QEMU's loadable module support didn't honour the KConfig rules for individual targets.
So if you built a module for any target, it became loadable for all targets, regardless of KConfig rules. That
bug is now addressed in QEMU 7.1.0 per that commit I mentioned previously for modinfo handling.

It isn't practical to backport that stuff, so I'm inclined to just close this bug as CANTFIX and later Fedora
releases with newer QEMU will get the right fix automatically

Comment 21 Laszlo Ersek 2022-11-07 10:11:41 UTC
That certainly works for me; thanks for the analysis.

Comment 22 Andrea Bolognani 2022-11-08 18:18:25 UTC
(In reply to Daniel Berrangé from comment #20)
> (In reply to Laszlo Ersek from comment #19)
> > So we can simplify and just disable virtio-vga based on either host arch, or
> > guest arch. If we disable virtio-vga based on host arch, then virtio-vga
> > will be absent from qemu-system-aarch64 *plus* all other target emulators on
> > aarch64 hosts, which is likely not good. If we disable virtio-vga based on
> > guest arch, then qemu-system-aarch64 will lack the device model on all other
> > host arches too (not just aarch64), for example on x86_64 hosts. That's a
> > bit of an overkill (in that setup, only TCG is available, and virtio-vga
> > works OK with TCG), but probably not too relevant -- users can *still* use
> > virtio-gpu-pci.)
> 
> QEMU KConfig configuration policy has *always* disabled virtio-vga on
> qemu-system-aarch64, regardless of host arch. 
> 
> The bug here is simply that QEMU's loadable module support didn't honour the
> KConfig rules for individual targets.
> So if you built a module for any target, it became loadable for all targets,
> regardless of KConfig rules. That
> bug is now addressed in QEMU 7.1.0 per that commit I mentioned previously
> for modinfo handling.

That's good, but is it still possible to override the default and
explicitly ask QEMU to build virtio-vga for the aarch64 target?

If so, should the QEMU build system be further tweaked so that it
outright rejects such a request? Or would that be going too far,
considering that virtio-vga is supposed to work on aarch64 when using
TCG instead of KVM for acceleration?

On the libvirt front, do you think it would make sense to adopt a
"defense in depth" approach by explicitly never attempting using
virtio-vga for aarch64 VMs, regardless of accelerator and whether
QEMU advertises the device as available? Or can we just assume that
this kind of situation will never occur again?

(I have drafted a patch re-adding the machine type check into
libvirt. It's bigger and uglier than I thought it would be, mostly
because we need to pass the necessary information several layers
further down.)

Comment 23 Daniel Berrangé 2022-11-08 18:55:24 UTC
(In reply to Andrea Bolognani from comment #22)
> That's good, but is it still possible to override the default and
> explicitly ask QEMU to build virtio-vga for the aarch64 target?

Not without editting the QEMU build Kconfig files.

> If so, should the QEMU build system be further tweaked so that it
> outright rejects such a request? Or would that be going too far,
> considering that virtio-vga is supposed to work on aarch64 when using
> TCG instead of KVM for acceleration?
> 
> On the libvirt front, do you think it would make sense to adopt a
> "defense in depth" approach by explicitly never attempting using
> virtio-vga for aarch64 VMs, regardless of accelerator and whether
> QEMU advertises the device as available? Or can we just assume that
> this kind of situation will never occur again?

I don't think libvirt should try to second guess what devices QEMU
supports, outside of normal capabilities probing.


Note You need to log in before you can comment on or make changes to this bug.