Bug 1375571
Summary: | KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR fails on ThunderX for the GIC, triggered by virtio-net-pci | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrea Bolognani <abologna> |
Component: | qemu-kvm-rhev | Assignee: | Eric Auger <eauger> |
Status: | CLOSED NOTABUG | QA Contact: | Chao Yang <chayang> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | abologna, chayang, drjones, eric.auger, jcm, juzhang, knoel, lersek, virt-maint, wei |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-20 12:03:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1173755 |
Description
Andrea Bolognani
2016-09-13 13:01:21 UTC
The "Warning: nic virtio-net-pci.0 has no peer" message reproduces with TCG as well, and it is related to the fact that no "-netdev TYPE,id=IDENT" (that is, backend) was provided on the command line, for "-device virtio-net-pci,netdev=IDENT" (that is, for the frontend) to reference. So I think we can take that part out of the picture. The "Failed to set device address" message comes from kvm_arm_set_device_addr(), file "target-arm/kvm.c". For the arm/aarch64 targets, this C source file provides a function called kvm_arm_register_device(). Board code calls this function, which sets up a memory listener. At machine-init-done time, the registered devices are passed to KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR ioctl()s, and that's what fails now. The only thing the virt board calls kvm_arm_register_device() for is GICv2 and GICv3, indirectly, through the kvm_arm_gic_realize() / kvm_arm_gicv3_realize() functions, respectively: - create_gic() in hw/arm/virt.c calls gicv3_class_name() vs. gic_class_name(), dependent on the requested GIC type, to find the name of the appropriate GIC class, - create_gic() then creates an object of this class, with qdev_create(), which is when the kvm_arm_gic_realize() / kvm_arm_gicv3_realize() functions end up being invoked, registering the device for said ioctl()s at machine init time, - the appropraite ioctl() is used at machine init time, and fails. In other words, I don't think the issue is related to virtio-net-pci; it seems to be related to the GIC on ThunderX. I don't know however why only virtio-net-pci triggers the problem. (Well, is virtio-net-pci the only thing that triggers it?) --*-- As a side point, it is a separate bug (or "wart") that we get the "Could not open option rom" warning after explicitly setting rombar=0. :/ ... Ahh okay, I know where *that* comes from: see pci_add_option_rom() in "hw/pci/pci.c": if (!pdev->rom_bar) { /* * Load rom via fw_cfg instead of creating a rom bar, * for 0.11 compatibility. */ [...] if (class == 0x0300) { rom_add_vga(pdev->romfile); } else { rom_add_option(pdev->romfile, -1); } return; } So, there's nothing to do about that warning. We can ignore it. It's down to the GIC (likely GICv3) on the ThunderX, and why virtio-net-pci triggers a problem with it. CC Drew and Wei. Alright, I suspect what's going on. In the kvm_arm_set_device_addr() function, which is being run at machine-init-done time, and passes the devices to *one* of the KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR ioctl()s, we have the following logic: /* If the device control API is available and we have a device fd on the * KVMDevice struct, let's use the newer API */ if (kd->dev_fd >= 0) { uint64_t addr = kd->kda.addr; attr->addr = (uintptr_t)&addr; ret = kvm_device_ioctl(kd->dev_fd, KVM_SET_DEVICE_ATTR, attr); } else { ret = kvm_vm_ioctl(kvm_state, KVM_ARM_SET_DEVICE_ADDR, &kd->kda); } The KVM_SET_DEVICE_ATTR ioctl -- the more modern one -- propagates through the kernel like this: kvm_device_ioctl() [virt/kvm/kvm_main.c] kvm_arch_vcpu_ioctl [arch/arm/kvm/arm.c] kvm_arm_vcpu_set_attr() [arch/arm/kvm/arm.c] kvm_arm_vcpu_arch_set_attr() [arch/arm64/kvm/guest.c] The implementation of this function looks like: int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) { int ret; switch (attr->group) { case KVM_ARM_VCPU_PMU_V3_CTRL: ret = kvm_arm_pmu_v3_set_attr(vcpu, attr); break; default: ret = -ENXIO; break; } return ret; } Now let's look at the deprecated ioctl, KVM_ARM_SET_DEVICE_ADDR: kvm_arch_vm_ioctl() [arch/arm/kvm/arm.c] kvm_vm_ioctl_set_device_addr() [arch/arm/kvm/arm.c] And from the implementation: switch (dev_id) { case KVM_ARM_DEVICE_VGIC_V2: if (!vgic_present) return -ENXIO; return kvm_vgic_addr(kvm, type, &dev_addr->addr, true); default: return -ENODEV; } The "No such device" message corresponds to ENODEV (ENXIO would be "No such device or address"), therefore the failure comes from the deprecated ioctl. Summary: on the ThunderX, we must have a GICv3, but for some reason, we pass it to the host with KVM_ARM_SET_DEVICE_ADDR, but that ioctl() only supports GICv2. (Which is BTW documented in "Documentation/virtual/kvm/api.txt".) Sounds about right. And it's correct that there's a GICv3 on ThunderX (and all other future systems) but only a v2 on Seattle and X-Gene (HPE ProLiant m400). More digging: in kvm_arm_set_device_addr(), we choose between KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR based on (dev_fd >= 0) -- if that evals to true, we pick KVM_SET_DEVICE_ATTR. So, since KVM_ARM_SET_DEVICE_ADDR is taken (see comment 4), we know (dev_fd < 0). Where is dev_fd set? It is set in: (a) kvm_arm_gicv3_realize(), but that function doesn't actually succeed with (dev_fd < 0): /* Try to create the device via the device control API */ s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V3, false); if (s->dev_fd < 0) { error_setg_errno(errp, -s->dev_fd, "error creating in-kernel VGIC"); return; } (b) kvm_arm_gic_realize(): /* Try to create the device via the device control API */ s->dev_fd = -1; ret = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V2, false); if (ret >= 0) { s->dev_fd = ret; [...] } else if (ret != -ENODEV && ret != -ENOTSUP) { error_setg_errno(errp, -ret, "error creating in-kernel VGIC"); return; } IOW, kvm_arm_gic_realize(), and only kvm_arm_gic_realize(), makes it possible to reach kvm_arm_set_device_addr() with (dev_fd < 0). So, this clearly means that we're trying to create a guest with GICv2 when the host has GICv3. I googled this a bit more, and found: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1566564 > ARM systems can only boot KVM guests that have the same GIC (Generic > Interrupt Controller) version as the host. GICv2 is the QEMU default but, > if your system does not have a GICv2, you need to know your GIC version > and pass that down to the QEMU command line (e.g. -M virt,gic_version=3). > If you (or your tools) don't know to do that, your guests will just fail > to boot on non-GICv2 hosts with the obscure error "Failed to set device > address: No such device". > [...] > Recent changes to QEMU and libvirt have improved this situation. QEMU now > exposes a "query-gic-capabilities" QMP interface that can let the caller > ask what GIC types are available for guests to use on that host. libvirt > can now make use of this QEMU interface, and expose that information to > users via the domcapabilities interface. Further, the user can specify a > gic version of "host" to have libvirt choose a detected GIC version, or > the user can omit the <gic> feature altogether, and libvirt will choose a > GIC version supported by the host and update the guest XML appropriately. > This allows tools like virt-install and nova to generate GIC-agnostic XML > that can boot on any arm64 host. This seems to be an accurate / educative account of the status quo. The minimal reproducer /usr/libexec/qemu-kvm \ -M virt,accel=kvm -cpu host \ -nographic -nodefaults -no-user-config \ -device virtio-net-pci,rombar=0 from comment 0 is simply invalid on the ThunderX, it should say -M virt,accel=kvm,gic_version=3 However, I think our libvirt *already* handles this (when virtio-net-pci is not used, anyway) -- after all "normally" libvirt guests do start on ThunderX, right? Which makes me believe that using virtio-net-pci somehow prevents libvirt from generating the ",gic_version=3" part. Andrea, when you encounter this problem, can you please capture the QEMU command line from under /var/log/libvirt/qemu, and also the QMP traffic? (For the latter I put log_filters="1:qemu_monitor_ 1:qemu_agent" log_outputs="1:file:/var/log/libvirt/libvirtd.log" in "/etc/libvirt/libvirtd.conf", although I obviously don't need to tell you that :)) Thanks. So, I reserved a ThunderX, and I cannot reproduce the issue. Andrea, can you please provide all version numbers and the full domain XML? Here's what I tried: - first I tried to plug virtio-net-pci directly into the PCI Express root bus (pcie.0), as an integrated device. Libvirt didn't accept that. - Then I tried to do the topology seen on Q35: dmi-to-pci bridge into pcie.0, then a pci-bridge into dmi-to-pci. This works on upstream QEMU, but not on downstream qemu, where the dmi-to-pci bridge is not built into the binary. - Finally I tried to plug a PCIe root port into pcie.0, and then place virtio-net-pci on the root port. This fails because the root port (== ioh3420) device cannot be initialized. The reason is this (checked it with gdb): ioh3420_initfn() [hw/pci-bridge/ioh3420.c] msi_init() [hw/pci/msi.c] if (!msi_nonbroken) { error_setg(errp, "MSI is not supported by interrupt controller"); return -ENOTSUP; } And "msi_nonbroken" is false here, because only "hw/intc/arm_gicv2m.c" sets it to true -- MSI is apparently known to be broken on GICv3. Summary: I cannot reproduce the problem, simply because I'm unable to create a PCI Express / PCI topology with downstream libvirt and QEMU where I could have *any* virtio-net-pci device. NB: the above error message ("MSI is not supported...") is upstream-only at the moment, but the logic is the same in downstream. See upstream commit 1108b2f8a939f "pci: Convert msi_init() to Error and fix callers to check it"; part of v2.7.0. (In reply to Laszlo Ersek from comment #6) > More digging: in kvm_arm_set_device_addr(), we choose between > KVM_SET_DEVICE_ATTR / KVM_ARM_SET_DEVICE_ADDR based on (dev_fd >= 0) -- if > that evals to true, we pick KVM_SET_DEVICE_ATTR. > > So, since KVM_ARM_SET_DEVICE_ADDR is taken (see comment 4), we know (dev_fd > < 0). Where is dev_fd set? > > It is set in: > > (a) kvm_arm_gicv3_realize(), but that function doesn't actually succeed with > (dev_fd < 0): > > /* Try to create the device via the device control API */ > s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V3, > false); > if (s->dev_fd < 0) { > error_setg_errno(errp, -s->dev_fd, "error creating in-kernel VGIC"); > return; > } > > (b) kvm_arm_gic_realize(): > > /* Try to create the device via the device control API */ > s->dev_fd = -1; > ret = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V2, false); > if (ret >= 0) { > s->dev_fd = ret; > [...] > } else if (ret != -ENODEV && ret != -ENOTSUP) { > error_setg_errno(errp, -ret, "error creating in-kernel VGIC"); > return; > } > > IOW, kvm_arm_gic_realize(), and only kvm_arm_gic_realize(), makes it > possible to reach kvm_arm_set_device_addr() with (dev_fd < 0). > > So, this clearly means that we're trying to create a guest with GICv2 when > the host has GICv3. I googled this a bit more, and found: > > https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1566564 > > > ARM systems can only boot KVM guests that have the same GIC (Generic > > Interrupt Controller) version as the host. GICv2 is the QEMU default but, > > if your system does not have a GICv2, you need to know your GIC version > > and pass that down to the QEMU command line (e.g. -M virt,gic_version=3). > > If you (or your tools) don't know to do that, your guests will just fail > > to boot on non-GICv2 hosts with the obscure error "Failed to set device > > address: No such device". > > [...] > > Recent changes to QEMU and libvirt have improved this situation. QEMU now > > exposes a "query-gic-capabilities" QMP interface that can let the caller > > ask what GIC types are available for guests to use on that host. libvirt > > can now make use of this QEMU interface, and expose that information to > > users via the domcapabilities interface. Further, the user can specify a > > gic version of "host" to have libvirt choose a detected GIC version, or > > the user can omit the <gic> feature altogether, and libvirt will choose a > > GIC version supported by the host and update the guest XML appropriately. > > This allows tools like virt-install and nova to generate GIC-agnostic XML > > that can boot on any arm64 host. > > This seems to be an accurate / educative account of the status quo. It is, except for > > [...] the user can specify a > > gic version of "host" to have libvirt choose a detected GIC version because <gic version='host'/> is translated directly into the QEMU counterpart gic-version=host with no further detection (detective?) work performed by libvirt. It's a very minor detail anyway :) > The > minimal reproducer > > /usr/libexec/qemu-kvm \ > -M virt,accel=kvm -cpu host \ > -nographic -nodefaults -no-user-config \ > -device virtio-net-pci,rombar=0 > > from comment 0 is simply invalid on the ThunderX, it should say > > -M virt,accel=kvm,gic_version=3 Yup, you're absolutely right. > However, I think our libvirt *already* handles this (when virtio-net-pci is > not used, anyway) -- after all "normally" libvirt guests do start on > ThunderX, right? They do, and yes, libvirt will automatically add <gic version='3'/> to the guest configuration on ThunderX based on the output of QEMU's query-gic-capabilities QMP command. > Which makes me believe that using virtio-net-pci somehow prevents libvirt > from generating the ",gic_version=3" part. That's actually not the case: turns out that I simply had <gic version='2'/> in my guest configuration for some reason. How embarassing :S So I guess GICv2 emulation is working now? I'm pretty sure a few months back this was not the case, and guest configured to use GICv2 would fail to start at all. ... And that's still the case, actually. I changed one of my virtio-mmio guests to GICv2 and it fails to start with the expected Failed to set device address: No such device error. It's just that the virtio-net-pci error was being triggered earlier, and ended up shadowing this one. Now that I've changed my virtio-pci guest to use GICv3 instead of GICv2 I've run into another issue with ioh3420, but as usual you've been faster than me and you've already described it in Comment 8 :) So if MSI is known to be broken on GICv3, and ioh3420 requires it, we can conclude that virtio-pci guests are simply not possible on ThunderX, a known limitation we'll have to overcome for 7.4. > Andrea, when you encounter this problem, can you please capture the QEMU > command line from under /var/log/libvirt/qemu, and also the QMP traffic? > > (For the latter I put > > log_filters="1:qemu_monitor_ 1:qemu_agent" > log_outputs="1:file:/var/log/libvirt/libvirtd.log" > > in "/etc/libvirt/libvirtd.conf", although I obviously don't need to tell you > that :)) > > Thanks. (In reply to Andrea Bolognani from comment #9) > So if MSI is known to be broken on GICv3, and ioh3420 > requires it, we can conclude that virtio-pci guests are > simply not possible on ThunderX, a known limitation we'll > have to overcome for 7.4. I agree. Shall we repurpose this BZ for "fix MSI on GICv3"? That seems to be a serious enough gap in functionality that I suspect it must already be tracked somewhere. If we don't repurpose this bug, then we should close it as NOTABUG. Thanks! (In reply to Laszlo Ersek from comment #11) > (In reply to Andrea Bolognani from comment #9) > > > So if MSI is known to be broken on GICv3, and ioh3420 > > requires it, we can conclude that virtio-pci guests are > > simply not possible on ThunderX, a known limitation we'll > > have to overcome for 7.4. > > I agree. > > Shall we repurpose this BZ for "fix MSI on GICv3"? That seems to be a > serious enough gap in functionality that I suspect it must already be > tracked somewhere. Eric, do you know anything about this? What's the status of MSI-on-GICv3? Thank you. Thanks for the accurate, educative account of the status quo :-) As we're already tracking gicv3 ITS support with other bugs (assigned to Eric for 7.4), then we can close this one as not-a-bug. Status of vITS support is: [PATCH v7 0/8] vITS support https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06134.html It should be close to upstream gate. for MSI support on ACPI guest, we also need the IORT generation linking root complex and ITS node. I prototyped this and it is now functional. I am currently waiting for Prem Malappa since he was the first to write with a minimalist IORT support in the context of his vSMMU series. Depending on him I may send an independent series with RC and ITS node only (no smmu). This work is complete for a while and suprisingly the needInfo has just started to nag me few days ago. Clear it. |