1375571 – KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR fails on ThunderX for the GIC, triggered by virtio-net-pci

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1375571 - KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR fails on ThunderX for the GIC, triggered by virtio-net-pci

Summary: KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR fails on ThunderX for the GIC, ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.3
Hardware:	aarch64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Auger
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1173755
TreeView+	depends on / blocked

Reported:	2016-09-13 13:01 UTC by Andrea Bolognani
Modified:	2022-10-21 09:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-20 12:03:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrea Bolognani 2016-09-13 13:01:21 UTC

I have a guest configured to use virtio-net-pci as follows:

  <interface type='network'>
    <mac address='52:54:00:57:76:ad'/>
    <source network='default'/>
    <model type='virtio'/>
    <rom bar='off'/>
    <address type='pci' domain='0x0000' bus='0x02'
                        slot='0x00' function='0x0'/>
  </interface>

The guest works fine on Moonshot, but fails to start on
ThunderX with the following error:

  error: Failed to start domain abologna-rhel73
  error: internal error: qemu unexpectedly closed the monitor:
  Could not open option rom 'pxe-virtio.rom': No such file or
  directory
  Failed to set device address: No such device

Minimal reproducers that doesn't involve libvirt:

  /usr/libexec/qemu-kvm \
    -M virt,accel=kvm -cpu host \
    -nographic -nodefaults -no-user-config \
    -device virtio-net-pci

  /usr/libexec/qemu-kvm \
    -M virt,accel=kvm -cpu host \
    -nographic -nodefaults -no-user-config \
    -device virtio-net-pci,rombar=0

The first command will fail (as expected, see Bug 1337510)
on both Moonshot and ThunderX with the following error:

  qemu-kvm: -device virtio-net-pci: failed to find romfile
  "pxe-virtio.rom"

The second command will work, with warnings, on Moonshot:

  Could not open option rom 'pxe-virtio.rom': No such file or
  directory
  Warning: nic virtio-net-pci.0 has no peer

but will fail on ThunderX:

  Could not open option rom 'pxe-virtio.rom': No such file or
  directory
  Warning: nic virtio-net-pci.0 has no peer
  Failed to set device address: No such device
  Aborted

As a consequence of this bug, virtio-net-pci is completely
unusable on ThunderX.

kernel-4.5.0-9.el7.aarch64
AAVMF-20160608-3.git988715a.el7.noarch
qemu-kvm-rhev-2.6.0-23.el7.aarch64
libvirt-2.0.0-8.el7.aarch64

Comment 2 Laszlo Ersek 2016-09-13 16:50:02 UTC

The "Warning: nic virtio-net-pci.0 has no peer" message reproduces with TCG as well, and it is related to the fact that no "-netdev TYPE,id=IDENT" (that is, backend) was provided on the command line, for "-device virtio-net-pci,netdev=IDENT" (that is, for the frontend) to reference. So I think we can take that part out of the picture.

The "Failed to set device address" message comes from kvm_arm_set_device_addr(), file "target-arm/kvm.c". For the arm/aarch64 targets, this C source file provides a function called kvm_arm_register_device(). Board code calls this function, which sets up a memory listener. At machine-init-done time, the registered devices are passed to KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR ioctl()s, and that's what fails now.

The only thing the virt board calls kvm_arm_register_device() for is GICv2 and GICv3, indirectly, through the kvm_arm_gic_realize() / kvm_arm_gicv3_realize() functions, respectively:

- create_gic() in hw/arm/virt.c calls gicv3_class_name() vs. gic_class_name(), dependent on the requested GIC type, to find the name of the appropriate GIC class,
- create_gic() then creates an object of this class, with qdev_create(), which is when the kvm_arm_gic_realize() / kvm_arm_gicv3_realize() functions end up being invoked, registering the device for said ioctl()s at machine init time,
- the appropraite ioctl() is used at machine init time, and fails.

In other words, I don't think the issue is related to virtio-net-pci; it seems to be related to the GIC on ThunderX.

I don't know however why only virtio-net-pci triggers the problem. (Well, is virtio-net-pci the only thing that triggers it?)

--*--

As a side point, it is a separate bug (or "wart") that we get the "Could not open option rom" warning after explicitly setting rombar=0. :/ ... Ahh okay, I know where *that* comes from: see pci_add_option_rom() in "hw/pci/pci.c":

    if (!pdev->rom_bar) {
        /*
         * Load rom via fw_cfg instead of creating a rom bar,
         * for 0.11 compatibility.
         */
[...]
        if (class == 0x0300) {
            rom_add_vga(pdev->romfile);
        } else {
            rom_add_option(pdev->romfile, -1);
        }
        return;
    }

So, there's nothing to do about that warning. We can ignore it. It's down to the GIC (likely GICv3) on the ThunderX, and why virtio-net-pci triggers a problem with it.

Comment 3 Laszlo Ersek 2016-09-13 16:51:40 UTC

CC Drew and Wei.

Comment 4 Laszlo Ersek 2016-09-13 17:06:07 UTC

Alright, I suspect what's going on. In the kvm_arm_set_device_addr() function, which is being run at machine-init-done time, and passes the devices to *one* of the KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR ioctl()s, we have the following logic:

    /* If the device control API is available and we have a device fd on the
     * KVMDevice struct, let's use the newer API
     */
    if (kd->dev_fd >= 0) {
        uint64_t addr = kd->kda.addr;
        attr->addr = (uintptr_t)&addr;
        ret = kvm_device_ioctl(kd->dev_fd, KVM_SET_DEVICE_ATTR, attr);
    } else {
        ret = kvm_vm_ioctl(kvm_state, KVM_ARM_SET_DEVICE_ADDR, &kd->kda);
    }

The KVM_SET_DEVICE_ATTR ioctl -- the more modern one -- propagates through the kernel like this:

kvm_device_ioctl()                 [virt/kvm/kvm_main.c]
  kvm_arch_vcpu_ioctl              [arch/arm/kvm/arm.c]
    kvm_arm_vcpu_set_attr()        [arch/arm/kvm/arm.c]
      kvm_arm_vcpu_arch_set_attr() [arch/arm64/kvm/guest.c]

The implementation of this function looks like:

int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
                               struct kvm_device_attr *attr)
{
        int ret;

        switch (attr->group) {
        case KVM_ARM_VCPU_PMU_V3_CTRL:
                ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
                break;
        default:
                ret = -ENXIO;
                break;
        }

        return ret;
}

Now let's look at the deprecated ioctl, KVM_ARM_SET_DEVICE_ADDR:

kvm_arch_vm_ioctl()              [arch/arm/kvm/arm.c]
  kvm_vm_ioctl_set_device_addr() [arch/arm/kvm/arm.c]

And from the implementation:

        switch (dev_id) {
        case KVM_ARM_DEVICE_VGIC_V2:
                if (!vgic_present)
                        return -ENXIO;
                return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
        default:
                return -ENODEV;
        }

The "No such device" message corresponds to ENODEV (ENXIO would be "No such device or address"), therefore the failure comes from the deprecated ioctl.

Summary: on the ThunderX, we must have a GICv3, but for some reason, we pass it to the host with KVM_ARM_SET_DEVICE_ADDR, but that ioctl() only supports GICv2.
(Which is BTW documented in "Documentation/virtual/kvm/api.txt".)

Comment 5 Jon Masters 2016-09-13 21:30:42 UTC

Sounds about right. And it's correct that there's a GICv3 on ThunderX (and all other future systems) but only a v2 on Seattle and X-Gene (HPE ProLiant m400).

Comment 6 Laszlo Ersek 2016-09-14 07:36:47 UTC

More digging: in kvm_arm_set_device_addr(), we choose between KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR based on (dev_fd >= 0) -- if that evals to true, we pick KVM_SET_DEVICE_ATTR.

So, since KVM_ARM_SET_DEVICE_ADDR is taken (see comment 4), we know (dev_fd < 0). Where is dev_fd set?

It is set in:

(a) kvm_arm_gicv3_realize(), but that function doesn't actually succeed with (dev_fd < 0):

    /* Try to create the device via the device control API */
    s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V3, false);
    if (s->dev_fd < 0) {
        error_setg_errno(errp, -s->dev_fd, "error creating in-kernel VGIC");
        return;
    }

(b) kvm_arm_gic_realize():

    /* Try to create the device via the device control API */
    s->dev_fd = -1;
    ret = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V2, false);
    if (ret >= 0) {
        s->dev_fd = ret;
[...]
    } else if (ret != -ENODEV && ret != -ENOTSUP) {
        error_setg_errno(errp, -ret, "error creating in-kernel VGIC");
        return;
    }

IOW, kvm_arm_gic_realize(), and only kvm_arm_gic_realize(), makes it possible to reach kvm_arm_set_device_addr() with (dev_fd < 0).

So, this clearly means that we're trying to create a guest with GICv2 when the host has GICv3. I googled this a bit more, and found:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1566564

> ARM systems can only boot KVM guests that have the same GIC (Generic
> Interrupt Controller) version as the host. GICv2 is the QEMU default but,
> if your system does not have a GICv2, you need to know your GIC version
> and pass that down to the QEMU command line (e.g. -M virt,gic_version=3).
> If you (or your tools) don't know to do that, your guests will just fail
> to boot on non-GICv2 hosts with the obscure error "Failed to set device
> address: No such device".
> [...]
> Recent changes to QEMU and libvirt have improved this situation. QEMU now
> exposes a "query-gic-capabilities" QMP interface that can let the caller
> ask what GIC types are available for guests to use on that host. libvirt
> can now make use of this QEMU interface, and expose that information to
> users via the domcapabilities interface. Further, the user can specify a
> gic version of "host" to have libvirt choose a detected GIC version, or
> the user can omit the <gic> feature altogether, and libvirt will choose a
> GIC version supported by the host and update the guest XML appropriately.
> This allows tools like virt-install and nova to generate GIC-agnostic XML
> that can boot on any arm64 host.

This seems to be an accurate / educative account of the status quo. The minimal reproducer

  /usr/libexec/qemu-kvm \
    -M virt,accel=kvm -cpu host \
    -nographic -nodefaults -no-user-config \
    -device virtio-net-pci,rombar=0

from comment 0 is simply invalid on the ThunderX, it should say

    -M virt,accel=kvm,gic_version=3

However, I think our libvirt *already* handles this (when virtio-net-pci is not used, anyway) -- after all "normally" libvirt guests do start on ThunderX, right?

Which makes me believe that using virtio-net-pci somehow prevents libvirt from generating the ",gic_version=3" part.

Andrea, when you encounter this problem, can you please capture the QEMU command line from under /var/log/libvirt/qemu, and also the QMP traffic?

(For the latter I put

log_filters="1:qemu_monitor_ 1:qemu_agent"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

in "/etc/libvirt/libvirtd.conf", although I obviously don't need to tell you that :))

Thanks.

Comment 8 Laszlo Ersek 2016-09-14 10:36:48 UTC

So, I reserved a ThunderX, and I cannot reproduce the issue. Andrea, can you please provide all version numbers and the full domain XML?

Here's what I tried:
- first I tried to plug virtio-net-pci directly into the PCI Express root bus (pcie.0), as an integrated device. Libvirt didn't accept that.

- Then I tried to do the topology seen on Q35: dmi-to-pci bridge into pcie.0, then a pci-bridge into dmi-to-pci. This works on upstream QEMU, but not on downstream qemu, where the dmi-to-pci bridge is not built into the binary.

- Finally I tried to plug a PCIe root port into pcie.0, and then place virtio-net-pci on the root port. This fails because the root port (== ioh3420) device cannot be initialized. The reason is this (checked it with gdb):

ioh3420_initfn() [hw/pci-bridge/ioh3420.c]
  msi_init()     [hw/pci/msi.c]

    if (!msi_nonbroken) {
        error_setg(errp, "MSI is not supported by interrupt controller");
        return -ENOTSUP;
    }

And "msi_nonbroken" is false here, because only "hw/intc/arm_gicv2m.c" sets it to true -- MSI is apparently known to be broken on GICv3.

Summary: I cannot reproduce the problem, simply because I'm unable to create a PCI Express / PCI topology with downstream libvirt and QEMU where I could have *any* virtio-net-pci device.

NB: the above error message ("MSI is not supported...") is upstream-only at the moment, but the logic is the same in downstream. See upstream commit 1108b2f8a939f "pci: Convert msi_init() to Error and fix callers to check it"; part of v2.7.0.

Comment 9 Andrea Bolognani 2016-09-14 10:54:47 UTC

(In reply to Laszlo Ersek from comment #6)
> More digging: in kvm_arm_set_device_addr(), we choose between
> KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR based on (dev_fd >= 0) -- if
> that evals to true, we pick KVM_SET_DEVICE_ATTR.
> 
> So, since KVM_ARM_SET_DEVICE_ADDR is taken (see comment 4), we know (dev_fd
> < 0). Where is dev_fd set?
> 
> It is set in:
> 
> (a) kvm_arm_gicv3_realize(), but that function doesn't actually succeed with
> (dev_fd < 0):
> 
>     /* Try to create the device via the device control API */
>     s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V3,
> false);
>     if (s->dev_fd < 0) {
>         error_setg_errno(errp, -s->dev_fd, "error creating in-kernel VGIC");
>         return;
>     }
> 
> (b) kvm_arm_gic_realize():
> 
>     /* Try to create the device via the device control API */
>     s->dev_fd = -1;
>     ret = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V2, false);
>     if (ret >= 0) {
>         s->dev_fd = ret;
> [...]
>     } else if (ret != -ENODEV && ret != -ENOTSUP) {
>         error_setg_errno(errp, -ret, "error creating in-kernel VGIC");
>         return;
>     }
> 
> IOW, kvm_arm_gic_realize(), and only kvm_arm_gic_realize(), makes it
> possible to reach kvm_arm_set_device_addr() with (dev_fd < 0).
> 
> So, this clearly means that we're trying to create a guest with GICv2 when
> the host has GICv3. I googled this a bit more, and found:
> 
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1566564
> 
> > ARM systems can only boot KVM guests that have the same GIC (Generic
> > Interrupt Controller) version as the host. GICv2 is the QEMU default but,
> > if your system does not have a GICv2, you need to know your GIC version
> > and pass that down to the QEMU command line (e.g. -M virt,gic_version=3).
> > If you (or your tools) don't know to do that, your guests will just fail
> > to boot on non-GICv2 hosts with the obscure error "Failed to set device
> > address: No such device".
> > [...]
> > Recent changes to QEMU and libvirt have improved this situation. QEMU now
> > exposes a "query-gic-capabilities" QMP interface that can let the caller
> > ask what GIC types are available for guests to use on that host. libvirt
> > can now make use of this QEMU interface, and expose that information to
> > users via the domcapabilities interface. Further, the user can specify a
> > gic version of "host" to have libvirt choose a detected GIC version, or
> > the user can omit the <gic> feature altogether, and libvirt will choose a
> > GIC version supported by the host and update the guest XML appropriately.
> > This allows tools like virt-install and nova to generate GIC-agnostic XML
> > that can boot on any arm64 host.
> 
> This seems to be an accurate / educative account of the status quo.

It is, except for

> > [...] the user can specify a
> > gic version of "host" to have libvirt choose a detected GIC version

because

  <gic version='host'/>

is translated directly into the QEMU counterpart

  gic-version=host

with no further detection (detective?) work performed by
libvirt. It's a very minor detail anyway :)

> The
> minimal reproducer
> 
>   /usr/libexec/qemu-kvm \
>     -M virt,accel=kvm -cpu host \
>     -nographic -nodefaults -no-user-config \
>     -device virtio-net-pci,rombar=0
> 
> from comment 0 is simply invalid on the ThunderX, it should say
> 
>     -M virt,accel=kvm,gic_version=3

Yup, you're absolutely right.

> However, I think our libvirt *already* handles this (when virtio-net-pci is
> not used, anyway) -- after all "normally" libvirt guests do start on
> ThunderX, right?

They do, and yes, libvirt will automatically add

  <gic version='3'/>

to the guest configuration on ThunderX based on the output
of QEMU's query-gic-capabilities QMP command.

> Which makes me believe that using virtio-net-pci somehow prevents libvirt
> from generating the ",gic_version=3" part.

That's actually not the case: turns out that I simply had

  <gic version='2'/>

in my guest configuration for some reason. How embarassing :S

So I guess GICv2 emulation is working now? I'm pretty sure
a few months back this was not the case, and guest configured
to use GICv2 would fail to start at all.

... And that's still the case, actually. I changed one of my
virtio-mmio guests to GICv2 and it fails to start with the
expected

  Failed to set device address: No such device

error. It's just that the virtio-net-pci error was being
triggered earlier, and ended up shadowing this one.

Now that I've changed my virtio-pci guest to use GICv3
instead of GICv2 I've run into another issue with ioh3420,
but as usual you've been faster than me and you've already
described it in Comment 8 :)

So if MSI is known to be broken on GICv3, and ioh3420
requires it, we can conclude that virtio-pci guests are
simply not possible on ThunderX, a known limitation we'll
have to overcome for 7.4.

> Andrea, when you encounter this problem, can you please capture the QEMU
> command line from under /var/log/libvirt/qemu, and also the QMP traffic?
> 
> (For the latter I put
> 
> log_filters="1:qemu_monitor_ 1:qemu_agent"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
> 
> in "/etc/libvirt/libvirtd.conf", although I obviously don't need to tell you
> that :))
> 
> Thanks.

Comment 11 Laszlo Ersek 2016-09-14 12:14:41 UTC

(In reply to Andrea Bolognani from comment #9)

> So if MSI is known to be broken on GICv3, and ioh3420
> requires it, we can conclude that virtio-pci guests are
> simply not possible on ThunderX, a known limitation we'll
> have to overcome for 7.4.

I agree.

Shall we repurpose this BZ for "fix MSI on GICv3"? That seems to be a serious enough gap in functionality that I suspect it must already be tracked somewhere.

If we don't repurpose this bug, then we should close it as NOTABUG.

Thanks!

Comment 12 Laszlo Ersek 2016-09-14 12:17:32 UTC

(In reply to Laszlo Ersek from comment #11)
> (In reply to Andrea Bolognani from comment #9)
> 
> > So if MSI is known to be broken on GICv3, and ioh3420
> > requires it, we can conclude that virtio-pci guests are
> > simply not possible on ThunderX, a known limitation we'll
> > have to overcome for 7.4.
> 
> I agree.
> 
> Shall we repurpose this BZ for "fix MSI on GICv3"? That seems to be a
> serious enough gap in functionality that I suspect it must already be
> tracked somewhere.

Eric, do you know anything about this? What's the status of MSI-on-GICv3? Thank you.

Comment 13 Andrew Jones 2016-09-20 12:03:11 UTC

Thanks for the accurate, educative account of the status quo :-) As we're already tracking gicv3 ITS support with other bugs (assigned to Eric for 7.4), then we can close this one as not-a-bug.

Comment 14 Eric Auger 2016-09-28 09:36:11 UTC

Status of vITS support is:
[PATCH v7 0/8] vITS support
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06134.html

It should be close to upstream gate.

for MSI support on ACPI guest, we also need the IORT generation linking
root complex and ITS node. I prototyped this and it is now functional. I am currently waiting for Prem Malappa since he was the first to write with a minimalist IORT support in the context of his vSMMU series. Depending on him I may send an independent series with RC and ITS node only (no smmu).

Comment 15 Eric Auger 2022-10-21 09:12:25 UTC

This work is complete for a while and suprisingly the needInfo has just started to nag me few days ago. Clear it.

Note You need to log in before you can comment on or make changes to this bug.