Bug 2032267
| Summary: | [wrb][qemu-kvm 6.2] qemu-kvm: vfio: Cannot reset device $vf_pci_address, depends on group 93 which is not owned. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Yanghang Liu <yanghliu> | |
| Component: | qemu-kvm | Assignee: | Alex Williamson <alex.williamson> | |
| qemu-kvm sub component: | Devices | QA Contact: | Yanghang Liu <yanghliu> | |
| Status: | CLOSED DEFERRED | Docs Contact: | ||
| Severity: | low | |||
| Priority: | low | CC: | ailan, alex.williamson, chayang, jinzhao, jusual, juzhang, kraxel, leiyang, pezhang, smitterl, virt-maint, xiagao, yanghliu, yuhuang | |
| Version: | 8.6 | Keywords: | Triaged | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2034365 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-23 22:13:14 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2034365 | |||
The test result is different between "qemu-kvm-15:6.2.0-1.rc2.scrmod+el8.6.0" and "qemu-kvm-6.1.0-5.module+el8.6.0":
The test result in qemu-kvm-15:6.2.0-1.rc2.scrmod+el8.6.0 :
(1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd is always hung before the vm is shutoff
(2) qemu-kvm: vfio: Cannot reset device 0000:a3:10.1, depends on group 93 which is not owned.
The test result in qemu-kvm-6.1.0-5.module+el8.6.0:
(1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd finished with 0 successfully
(2) The qemu-kvm will not throw any suspicious information
Mark the keyword "Regression"
NB: May need to fix/resolve for RHEL9 too What model SR-IOV device is being used here? Hi Alex, I have used X540 and XXV710 to test this scenario. Both can reproduce this problem. This can be reproduced on the 6.2 final build as well as upstream. The pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the key to reproducing the issue upstream. Bisecting this test scenario with the 6.0 machine type lands on the following commit:
commit d5daff7d312653b92f23c7a8e198090b32b8dae6
Author: Gerd Hoffmann <kraxel>
Date: Thu Nov 11 14:08:55 2021 +0100
pcie: implement slot power control for pcie root ports
With this patch hot-plugged pci devices will only be visible to the
guest if the guests hotplug driver has enabled slot power.
This should fix the hot-plug race which one can hit when hot-plugging
a pci device at boot, while the guest is in the middle of the pci bus
scan.
Signed-off-by: Gerd Hoffmann <kraxel>
Message-Id: <20211111130859.1171890-3-kraxel>
Reviewed-by: Michael S. Tsirkin <mst>
Signed-off-by: Michael S. Tsirkin <mst>
Digging a little further, it's specifically this option of the pc_compat_6_0 that triggers this issue:
{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },
We get to the offending vfio-pci function via:
#0 vfio_pci_hot_reset_one (vdev=0x5619ca4e2590) at ../hw/vfio/pci.c:2403
#1 0x00005619c7352023 in vfio_pci_reset (dev=0x5619ca4e2590)
at ../hw/vfio/pci.c:3186
#2 0x00005619c740bc9e in device_legacy_reset (dev=0x5619ca4e2590)
at ../hw/core/qdev.c:896
#3 0x00005619c740a67e in qdev_reset_one (dev=0x5619ca4e2590, opaque=0x0)
at ../hw/core/qdev.c:254
#4 0x00005619c740ac45 in qdev_walk_children (dev=0x5619ca4e2590,
pre_devfn=0x5619c740a5f8 <qdev_prereset>,
pre_busfn=0x5619c740a62d <qbus_prereset>,
post_devfn=0x5619c740a662 <qdev_reset_one>,
post_busfn=0x5619c740a685 <qbus_reset_one>, opaque=0x0)
at ../hw/core/qdev.c:421
#5 0x00005619c740a740 in qdev_reset_all (dev=0x5619ca4e2590)
at ../hw/core/qdev.c:272
#6 0x00005619c70e7004 in pci_device_reset (dev=0x5619ca4e2590)
at ../hw/pci/pci.c:351
#7 0x00005619c70ed41a in pci_set_power (d=0x5619ca4e2590, state=false)
at ../hw/pci/pci.c:2873
#8 0x00005619c70f22c1 in pcie_set_power_device (bus=0x5619c9ac7260,
dev=0x5619ca4e2590, opaque=0x7f60ad383eb1) at ../hw/pci/pcie.c:373
#9 0x00005619c70ea2ed in pci_for_each_device_under_bus (bus=0x5619c9ac7260,
fn=0x5619c70f228d <pcie_set_power_device>, opaque=0x7f60ad383eb1)
at ../hw/pci/pci.c:1694
#10 0x00005619c70ea348 in pci_for_each_device (bus=0x5619c9ac7260, bus_num=6,
fn=0x5619c70f228d <pcie_set_power_device>, opaque=0x7f60ad383eb1)
at ../hw/pci/pci.c:1705
#11 0x00005619c70f2385 in pcie_cap_update_power (hotplug_dev=0x5619c9ac6900)
at ../hw/pci/pcie.c:388
#12 0x00005619c70f2f9a in pcie_cap_slot_write_config (dev=0x5619c9ac6900,
old_slt_ctl=753, old_slt_sta=64, addr=108, val=1777, len=2)
at ../hw/pci/pcie.c:734
A confusing issue at this point is that we have a VF, all SR-IOV VFs must support FLR, so it seems like we should never fall through to attempting a bus reset when we have FLR. There are two issues around that. First, post 8.5 we added kernel commit f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute"), which clears the pci_dev.reset_fn pointer in the kernel BEFORE we detach from the driver, so we no longer have access to trigger an FLR. However, even if we were to correct that ordering issue, the device lock is held by the driver remove path, therefore we cannot grab the device lock to perform the reset. This reset is doomed to fail, which means we're going to fall through to attempt a bus reset, which also can never work.
The new QEMU behavior that triggers this issue is that subordinate devices are reset when a slot is powered off, which I believe is part of the standard PCIe hotplug process when responding to an eject request. One possible solution might be to skip the reset associated with the power state transition if the device is being removed, ex:
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e5993c1ef5..f594da4107 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2869,7 +2869,7 @@ void pci_set_power(PCIDevice *d, bool state)
memory_region_set_enabled(&d->bus_master_enable_region,
(pci_get_word(d->config + PCI_COMMAND)
& PCI_COMMAND_MASTER) && d->has_power);
- if (!d->has_power) {
+ if (!d->has_power && !d->qdev.pending_deleted_event) {
pci_device_reset(d);
}
}
Note that the issue reported here is essentially just the new scary log message, the device removal does succeed.
My impression is that when ejecting a device, we didn't previously trigger a device reset nor is there really any benefit to triggering that reset. I believe the reset is meant to more correctly emulate how a device that is not being removed would behave relative to removing power to the slot.
Gerd, what do you think, is the above proposal universally valid?
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index e5993c1ef5..f594da4107 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -2869,7 +2869,7 @@ void pci_set_power(PCIDevice *d, bool state) > memory_region_set_enabled(&d->bus_master_enable_region, > (pci_get_word(d->config + PCI_COMMAND) > & PCI_COMMAND_MASTER) && d->has_power); > - if (!d->has_power) { > + if (!d->has_power && !d->qdev.pending_deleted_event) { > pci_device_reset(d); > } > } > My impression is that when ejecting a device, we didn't previously trigger a > device reset nor is there really any benefit to triggering that reset. I > believe the reset is meant to more correctly emulate how a device that is > not being removed would behave relative to removing power to the slot. Correct. It makes the device invisible (reset bars etc) so the guest can't access it any more until power is turned back on and the device is re-initialized by the guest. > Gerd, what do you think, is the above proposal universally valid? Looks sane to me. Resetting a device which is going to be removed the next moment is rather pointless ... Proposed upstream: https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03425.html (In reply to Alex Williamson from comment #5) > This can be reproduced on the 6.2 final build as well as upstream. The > pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the > key to reproducing the issue upstream. Bisecting this test scenario with > the 6.0 machine type lands on the following commit: > > commit d5daff7d312653b92f23c7a8e198090b32b8dae6 > Author: Gerd Hoffmann <kraxel> > Date: Thu Nov 11 14:08:55 2021 +0100 > > pcie: implement slot power control for pcie root ports > > With this patch hot-plugged pci devices will only be visible to the > guest if the guests hotplug driver has enabled slot power. > > This should fix the hot-plug race which one can hit when hot-plugging > a pci device at boot, while the guest is in the middle of the pci bus > scan. > I hit an issue similar to the commit description. If hotplug balloon device under pcie-root-port before windows 2022 guest boot up, balloon device is not shown in Device Manager, and balloon can't function well after guest boot up. (qemu) device_add virtio-balloon-pci,id=balloon0,bus=pcie.0-root-port-5 (qemu) c (qemu) info balloon balloon: actual=8192 (qemu) balloon 4096 (qemu) info balloon balloon: actual=8192 It's reproducible with qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949, and works well with qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85. (In reply to Yumei Huang from comment #8) > (In reply to Alex Williamson from comment #5) > > This can be reproduced on the 6.2 final build as well as upstream. The > > pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the > > key to reproducing the issue upstream. Bisecting this test scenario with > > the 6.0 machine type lands on the following commit: > > > > commit d5daff7d312653b92f23c7a8e198090b32b8dae6 > > Author: Gerd Hoffmann <kraxel> > > Date: Thu Nov 11 14:08:55 2021 +0100 > > > > pcie: implement slot power control for pcie root ports > > > > With this patch hot-plugged pci devices will only be visible to the > > guest if the guests hotplug driver has enabled slot power. > > > > This should fix the hot-plug race which one can hit when hot-plugging > > a pci device at boot, while the guest is in the middle of the pci bus > > scan. > > > > I hit an issue similar to the commit description. File a separate bug, the issue here is a side-effect of the above commit, not whether the above commit is effective for its intended purpose. (In reply to Alex Williamson from comment #7) > Proposed upstream: > > https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03425.html After upstream discussion, it's become clear that the QEMU behavioral change is reasonable and that simply trying to skip the reset with the expectation that the device is being removed is tricky. QEMU has introduced a reset that vfio-pci cannot honor when the host kernel has the device locked. I think the best course of action is not to consider the new error message a regression, but instead generate an error message that is easier to understand. The issue here is still a corner case where the system admin has chosen an operation that ultimately results in asking to remove an in-use device while certain operations are blocked. It's reasonable to report errors for those blocked operations regardless whether they didn't occur previously. New proposal: https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00654.html (In reply to Alex Williamson from comment #9) > (In reply to Yumei Huang from comment #8) > > (In reply to Alex Williamson from comment #5) > > > This can be reproduced on the 6.2 final build as well as upstream. The > > > pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the > > > key to reproducing the issue upstream. Bisecting this test scenario with > > > the 6.0 machine type lands on the following commit: > > > > > > commit d5daff7d312653b92f23c7a8e198090b32b8dae6 > > > Author: Gerd Hoffmann <kraxel> > > > Date: Thu Nov 11 14:08:55 2021 +0100 > > > > > > pcie: implement slot power control for pcie root ports > > > > > > With this patch hot-plugged pci devices will only be visible to the > > > guest if the guests hotplug driver has enabled slot power. > > > > > > This should fix the hot-plug race which one can hit when hot-plugging > > > a pci device at boot, while the guest is in the middle of the pci bus > > > scan. > > > > > > > I hit an issue similar to the commit description. > > File a separate bug, the issue here is a side-effect of the above commit, > not whether the above commit is effective for its intended purpose. Thanks. Reported bug 2042242 on rhel9 for the balloon device issue. We can clone to rhel8 if plan to fix on rhel8 too. Alex - what happened to the patch for this bug and it's RHEL9 clone bug 2034365? I don't see it merged in QEMU and the trail grows cold after Jan-2022. Is this worth resurrecting for qemu-7.2 w/ backport to RHEL8 of just close the pair. Unclear to me of course whether other machine type changes noted in bug 2042242 may affect outcome. (In reply to John Ferlan from comment #12) > Alex - what happened to the patch for this bug and it's RHEL9 clone bug > 2034365? I don't see it merged in QEMU and the trail grows cold after > Jan-2022. > > Is this worth resurrecting for qemu-7.2 w/ backport to RHEL8 of just close > the pair. Unclear to me of course whether other machine type changes noted > in bug 2042242 may affect outcome. There's currently no upstream fix, as outlined in comment 5, the kernel cannot do a device reset due to both an ordering issue of clearing the device reset capabilities, but more importantly the fact that the device lock is held since the device is being actively removed. Upstream QEMU has rejected any path towards removing the device reset as part of the unplug path, and I self-nak'd the proposal in comment 10 upstream because we rely on some degree of reset failures and fall through to other reset mechanisms to simply make reset failure logging more accurate. This bug is essentially just a new warning from QEMU, a device reset was not previously attempted on this path and now that a reset is attempted, it cannot be satisfied. This in no way blocks the functionality of unplugging the device. This is also not expected to be a typical usage scenario, the host device needs to be in the process of being removed to trigger this, such as via removal of the VF, as originally reported here, or an unplug of a physical device. Reducing priority, severity, and adding dev cond nak upstream. I don't have any good ideas for a quick fix here. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
Description of problem: Trying to change the pf's vf number to 0 when one vf is used in the vm, the qemu-kvm throws the info: "qemu-kvm: vfio: Cannot reset device $vf_pci_address, depends on group 93 which is not owned" Version-Release number of selected component (if applicable): qemu-kvm-6.2.0-1.rc2.scrmod+el8.6.0+13458+219ac088.wrb211124.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a VF # echo 1 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs 2. start a vm with a VF The VF device xml: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0xa3' slot='0x10' function='0x1'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </hostdev> The VF qemu cmd line: -device {"driver":"vfio-pci","host":"0000:a3:10.1","id":"hostdev0","bus":"pci.4","addr":"0x0"} 2. change the pf's vf number to 0 # echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs Actual results: (1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd is always hung before the vm is shutoff (2) 2021-12-14T09:21:20.405735Z qemu-kvm: vfio: Cannot reset device 0000:a3:10.1, depends on group 93 which is not owned. Expected results: (1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd finished with 0 successfully (2) The qemu-kvm will not throw any suspicious information Additional info: (1) The test device X540 info: # virsh nodedev-dumpxml pci_0000_a3_00_1 <device> <name>pci_0000_a3_00_1</name> <path>/sys/devices/pci0000:a0/0000:a0:01.1/0000:a3:00.1</path> <parent>pci_0000_a0_01_1</parent> <driver> <name>ixgbe</name> </driver> <capability type='pci'> <class>0x020000</class> <domain>0</domain> <bus>163</bus> <slot>0</slot> <function>1</function> <product id='0x1528'>Ethernet Controller 10-Gigabit X540-AT2</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='virt_functions' maxCount='63'/> <capability type='vpd'> <name>X540 10GbE Controller</name> <fields access='readonly'> <manufacture_id>1028</manufacture_id> <part_number>G35362</part_number> <vendor_field index='0'>FFV17.5.10</vendor_field> <vendor_field index='1'>DSV1028VPDR.VER1.0</vendor_field> <vendor_field index='3'>DTINIC</vendor_field> <vendor_field index='4'>DCM10010087D521010087D5</vendor_field> <vendor_field index='5'>NPY2</vendor_field> <vendor_field index='6'>PMT1</vendor_field> <vendor_field index='7'>NMVIntel Corp</vendor_field> </fields> </capability> <iommuGroup number='94'> <address domain='0x0000' bus='0xa3' slot='0x00' function='0x1'/> </iommuGroup> <numa node='5'/> <pci-express> <link validity='cap' port='0' speed='5' width='8'/> <link validity='sta' speed='5' width='8'/> </pci-express> </capability> </device>