Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2032267

Summary:	[wrb][qemu-kvm 6.2] qemu-kvm: vfio: Cannot reset device $vf_pci_address, depends on group 93 which is not owned.
Product:	Red Hat Enterprise Linux 8	Reporter:	Yanghang Liu <yanghliu>
Component:	qemu-kvm	Assignee:	Alex Williamson <alex.williamson>
qemu-kvm sub component:	Devices	QA Contact:	Yanghang Liu <yanghliu>
Status:	CLOSED DEFERRED	Docs Contact:
Severity:	low
Priority:	low	CC:	ailan, alex.williamson, chayang, jinzhao, jusual, juzhang, kraxel, leiyang, pezhang, smitterl, virt-maint, xiagao, yanghliu, yuhuang
Version:	8.6	Keywords:	Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2034365 (view as bug list)		Environment:
Last Closed:	2023-05-23 22:13:14 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2034365

Description Yanghang Liu 2021-12-14 09:51:56 UTC

Description of problem:
Trying to change the pf's vf number to 0 when one vf is used in the vm, the qemu-kvm throws the info: "qemu-kvm: vfio: Cannot reset device $vf_pci_address, depends on group 93 which is not owned"


Version-Release number of selected component (if applicable):
qemu-kvm-6.2.0-1.rc2.scrmod+el8.6.0+13458+219ac088.wrb211124.x86_64



How reproducible:
100%

Steps to Reproduce:

1. create a VF 
# echo 1 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs


2. start a vm with a VF
The VF device xml:
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0xa3' slot='0x10' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>

The VF qemu cmd line:
-device {"driver":"vfio-pci","host":"0000:a3:10.1","id":"hostdev0","bus":"pci.4","addr":"0x0"}

2. change the pf's vf number to 0
# echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs


Actual results:
(1)
The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd is always hung before the vm is shutoff 
(2) 
2021-12-14T09:21:20.405735Z qemu-kvm: vfio: Cannot reset device 0000:a3:10.1, depends on group 93 which is not owned.

Expected results:
(1)
The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd finished with 0 successfully
(2)
The qemu-kvm will not throw any suspicious information


Additional info:

(1) The test device X540 info:

# virsh nodedev-dumpxml pci_0000_a3_00_1
<device>
  <name>pci_0000_a3_00_1</name>
  <path>/sys/devices/pci0000:a0/0000:a0:01.1/0000:a3:00.1</path>
  <parent>pci_0000_a0_01_1</parent>
  <driver>
    <name>ixgbe</name>
  </driver>
  <capability type='pci'>
    <class>0x020000</class>
    <domain>0</domain>
    <bus>163</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x1528'>Ethernet Controller 10-Gigabit X540-AT2</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='virt_functions' maxCount='63'/>
    <capability type='vpd'>
      <name>X540 10GbE Controller</name>
      <fields access='readonly'>
        <manufacture_id>1028</manufacture_id>
        <part_number>G35362</part_number>
        <vendor_field index='0'>FFV17.5.10</vendor_field>
        <vendor_field index='1'>DSV1028VPDR.VER1.0</vendor_field>
        <vendor_field index='3'>DTINIC</vendor_field>
        <vendor_field index='4'>DCM10010087D521010087D5</vendor_field>
        <vendor_field index='5'>NPY2</vendor_field>
        <vendor_field index='6'>PMT1</vendor_field>
        <vendor_field index='7'>NMVIntel Corp</vendor_field>
      </fields>
    </capability>
    <iommuGroup number='94'>
      <address domain='0x0000' bus='0xa3' slot='0x00' function='0x1'/>
    </iommuGroup>
    <numa node='5'/>
    <pci-express>
      <link validity='cap' port='0' speed='5' width='8'/>
      <link validity='sta' speed='5' width='8'/>
    </pci-express>
  </capability>
</device>

Comment 1 Yanghang Liu 2021-12-14 10:01:17 UTC

The test result is different between "qemu-kvm-15:6.2.0-1.rc2.scrmod+el8.6.0" and "qemu-kvm-6.1.0-5.module+el8.6.0":

  The test result in qemu-kvm-15:6.2.0-1.rc2.scrmod+el8.6.0 :
    (1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd is always hung before the vm is shutoff 
    (2) qemu-kvm: vfio: Cannot reset device 0000:a3:10.1, depends on group 93 which is not owned.


  The test result in qemu-kvm-6.1.0-5.module+el8.6.0:
    (1) The "echo 0 > /sys/bus/pci/devices/0000\:a3\:00.1/sriov_numvfs" cmd finished with 0 successfully
    (2) The qemu-kvm will not throw any suspicious information

Mark the keyword "Regression"

Comment 2 John Ferlan 2021-12-14 14:43:45 UTC

NB: May need to fix/resolve for RHEL9 too

Comment 3 Alex Williamson 2021-12-14 15:04:36 UTC

What model SR-IOV device is being used here?

Comment 4 Yanghang Liu 2021-12-15 02:28:16 UTC

Hi Alex,

I have used X540 and XXV710 to test this scenario.

Both can reproduce this problem.

Comment 5 Alex Williamson 2021-12-16 23:41:40 UTC

This can be reproduced on the 6.2 final build as well as upstream.  The pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the key to reproducing the issue upstream.  Bisecting this test scenario with the 6.0 machine type lands on the following commit:

commit d5daff7d312653b92f23c7a8e198090b32b8dae6
Author: Gerd Hoffmann <kraxel>
Date:   Thu Nov 11 14:08:55 2021 +0100

    pcie: implement slot power control for pcie root ports
    
    With this patch hot-plugged pci devices will only be visible to the
    guest if the guests hotplug driver has enabled slot power.
    
    This should fix the hot-plug race which one can hit when hot-plugging
    a pci device at boot, while the guest is in the middle of the pci bus
    scan.
    
    Signed-off-by: Gerd Hoffmann <kraxel>
    Message-Id: <20211111130859.1171890-3-kraxel>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Digging a little further, it's specifically this option of the pc_compat_6_0 that triggers this issue:

    { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },

We get to the offending vfio-pci function via:

#0  vfio_pci_hot_reset_one (vdev=0x5619ca4e2590) at ../hw/vfio/pci.c:2403
#1  0x00005619c7352023 in vfio_pci_reset (dev=0x5619ca4e2590)
    at ../hw/vfio/pci.c:3186
#2  0x00005619c740bc9e in device_legacy_reset (dev=0x5619ca4e2590)
    at ../hw/core/qdev.c:896
#3  0x00005619c740a67e in qdev_reset_one (dev=0x5619ca4e2590, opaque=0x0)
    at ../hw/core/qdev.c:254
#4  0x00005619c740ac45 in qdev_walk_children (dev=0x5619ca4e2590, 
    pre_devfn=0x5619c740a5f8 <qdev_prereset>, 
    pre_busfn=0x5619c740a62d <qbus_prereset>, 
    post_devfn=0x5619c740a662 <qdev_reset_one>, 
    post_busfn=0x5619c740a685 <qbus_reset_one>, opaque=0x0)
    at ../hw/core/qdev.c:421
#5  0x00005619c740a740 in qdev_reset_all (dev=0x5619ca4e2590)
    at ../hw/core/qdev.c:272
#6  0x00005619c70e7004 in pci_device_reset (dev=0x5619ca4e2590)
    at ../hw/pci/pci.c:351
#7  0x00005619c70ed41a in pci_set_power (d=0x5619ca4e2590, state=false)
    at ../hw/pci/pci.c:2873
#8  0x00005619c70f22c1 in pcie_set_power_device (bus=0x5619c9ac7260, 
    dev=0x5619ca4e2590, opaque=0x7f60ad383eb1) at ../hw/pci/pcie.c:373
#9  0x00005619c70ea2ed in pci_for_each_device_under_bus (bus=0x5619c9ac7260, 
    fn=0x5619c70f228d <pcie_set_power_device>, opaque=0x7f60ad383eb1)
    at ../hw/pci/pci.c:1694
#10 0x00005619c70ea348 in pci_for_each_device (bus=0x5619c9ac7260, bus_num=6, 
    fn=0x5619c70f228d <pcie_set_power_device>, opaque=0x7f60ad383eb1)
    at ../hw/pci/pci.c:1705
#11 0x00005619c70f2385 in pcie_cap_update_power (hotplug_dev=0x5619c9ac6900)
    at ../hw/pci/pcie.c:388
#12 0x00005619c70f2f9a in pcie_cap_slot_write_config (dev=0x5619c9ac6900, 
    old_slt_ctl=753, old_slt_sta=64, addr=108, val=1777, len=2)
    at ../hw/pci/pcie.c:734

A confusing issue at this point is that we have a VF, all SR-IOV VFs must support FLR, so it seems like we should never fall through to attempting a bus reset when we have FLR.  There are two issues around that.  First, post 8.5 we added kernel commit f42c35ea3b13 ("PCI/sysfs: Convert "reset" to static attribute"), which clears the pci_dev.reset_fn pointer in the kernel BEFORE we detach from the driver, so we no longer have access to trigger an FLR.  However, even if we were to correct that ordering issue, the device lock is held by the driver remove path, therefore we cannot grab the device lock to perform the reset.  This reset is doomed to fail, which means we're going to fall through to attempt a bus reset, which also can never work.

The new QEMU behavior that triggers this issue is that subordinate devices are reset when a slot is powered off, which I believe is part of the standard PCIe hotplug process when responding to an eject request.  One possible solution might be to skip the reset associated with the power state transition if the device is being removed, ex:

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e5993c1ef5..f594da4107 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2869,7 +2869,7 @@ void pci_set_power(PCIDevice *d, bool state)
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
                                & PCI_COMMAND_MASTER) && d->has_power);
-    if (!d->has_power) {
+    if (!d->has_power && !d->qdev.pending_deleted_event) {
         pci_device_reset(d);
     }
 }

Note that the issue reported here is essentially just the new scary log message, the device removal does succeed.

My impression is that when ejecting a device, we didn't previously trigger a device reset nor is there really any benefit to triggering that reset.  I believe the reset is meant to more correctly emulate how a device that is not being removed would behave relative to removing power to the slot.

Gerd, what do you think, is the above proposal universally valid?

Comment 6 Gerd Hoffmann 2021-12-17 07:43:20 UTC

> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e5993c1ef5..f594da4107 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2869,7 +2869,7 @@ void pci_set_power(PCIDevice *d, bool state)
>      memory_region_set_enabled(&d->bus_master_enable_region,
>                                (pci_get_word(d->config + PCI_COMMAND)
>                                 & PCI_COMMAND_MASTER) && d->has_power);
> -    if (!d->has_power) {
> +    if (!d->has_power && !d->qdev.pending_deleted_event) {
>          pci_device_reset(d);
>      }
>  }

> My impression is that when ejecting a device, we didn't previously trigger a
> device reset nor is there really any benefit to triggering that reset.  I
> believe the reset is meant to more correctly emulate how a device that is
> not being removed would behave relative to removing power to the slot.

Correct.  It makes the device invisible (reset bars etc) so the guest
can't access it any more until power is turned back on and the device
is re-initialized by the guest.

> Gerd, what do you think, is the above proposal universally valid?

Looks sane to me.  Resetting a device which is going to be removed
the next moment is rather pointless ...

Comment 7 Alex Williamson 2021-12-20 18:57:59 UTC

Proposed upstream:

https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03425.html

Comment 8 Yumei Huang 2021-12-29 07:02:32 UTC

(In reply to Alex Williamson from comment #5)
> This can be reproduced on the 6.2 final build as well as upstream.  The
> pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the
> key to reproducing the issue upstream.  Bisecting this test scenario with
> the 6.0 machine type lands on the following commit:
> 
> commit d5daff7d312653b92f23c7a8e198090b32b8dae6
> Author: Gerd Hoffmann <kraxel>
> Date:   Thu Nov 11 14:08:55 2021 +0100
> 
>     pcie: implement slot power control for pcie root ports
>     
>     With this patch hot-plugged pci devices will only be visible to the
>     guest if the guests hotplug driver has enabled slot power.
>     
>     This should fix the hot-plug race which one can hit when hot-plugging
>     a pci device at boot, while the guest is in the middle of the pci bus
>     scan.
>     

I hit an issue similar to the commit description. 

If hotplug balloon device under pcie-root-port before windows 2022 guest boot up, balloon device is not shown in Device Manager, and balloon can't function well after guest boot up.

(qemu) device_add virtio-balloon-pci,id=balloon0,bus=pcie.0-root-port-5
(qemu) c
(qemu) info balloon 
balloon: actual=8192
(qemu) balloon 4096
(qemu) info balloon 
balloon: actual=8192

It's reproducible with qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949, and works well with qemu-kvm-6.1.0-5.module+el8.6.0+13430+8fdd5f85.

Comment 9 Alex Williamson 2022-01-04 16:19:35 UTC

(In reply to Yumei Huang from comment #8)
> (In reply to Alex Williamson from comment #5)
> > This can be reproduced on the 6.2 final build as well as upstream.  The
> > pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the
> > key to reproducing the issue upstream.  Bisecting this test scenario with
> > the 6.0 machine type lands on the following commit:
> > 
> > commit d5daff7d312653b92f23c7a8e198090b32b8dae6
> > Author: Gerd Hoffmann <kraxel>
> > Date:   Thu Nov 11 14:08:55 2021 +0100
> > 
> >     pcie: implement slot power control for pcie root ports
> >     
> >     With this patch hot-plugged pci devices will only be visible to the
> >     guest if the guests hotplug driver has enabled slot power.
> >     
> >     This should fix the hot-plug race which one can hit when hot-plugging
> >     a pci device at boot, while the guest is in the middle of the pci bus
> >     scan.
> >     
> 
> I hit an issue similar to the commit description. 

File a separate bug, the issue here is a side-effect of the above commit, not whether the above commit is effective for its intended purpose.

Comment 10 Alex Williamson 2022-01-05 20:08:10 UTC

(In reply to Alex Williamson from comment #7)
> Proposed upstream:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-12/msg03425.html

After upstream discussion, it's become clear that the QEMU behavioral change is reasonable and that simply trying to skip the reset with the expectation that the device is being removed is tricky.  QEMU has introduced a reset that vfio-pci cannot honor when the host kernel has the device locked.  I think the best course of action is not to consider the new error message a regression, but instead generate an error message that is easier to understand.  The issue here is still a corner case where the system admin has chosen an operation that ultimately results in asking to remove an in-use device while certain operations are blocked.  It's reasonable to report errors for those blocked operations regardless whether they didn't occur previously.

New proposal:

https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00654.html

Comment 11 Yumei Huang 2022-01-19 07:56:31 UTC

(In reply to Alex Williamson from comment #9)
> (In reply to Yumei Huang from comment #8)
> > (In reply to Alex Williamson from comment #5)
> > > This can be reproduced on the 6.2 final build as well as upstream.  The
> > > pc-q35-rhel8.5.0 machine type is based on the pc-q35-6.0 type, which is the
> > > key to reproducing the issue upstream.  Bisecting this test scenario with
> > > the 6.0 machine type lands on the following commit:
> > > 
> > > commit d5daff7d312653b92f23c7a8e198090b32b8dae6
> > > Author: Gerd Hoffmann <kraxel>
> > > Date:   Thu Nov 11 14:08:55 2021 +0100
> > > 
> > >     pcie: implement slot power control for pcie root ports
> > >     
> > >     With this patch hot-plugged pci devices will only be visible to the
> > >     guest if the guests hotplug driver has enabled slot power.
> > >     
> > >     This should fix the hot-plug race which one can hit when hot-plugging
> > >     a pci device at boot, while the guest is in the middle of the pci bus
> > >     scan.
> > >     
> > 
> > I hit an issue similar to the commit description. 
> 
> File a separate bug, the issue here is a side-effect of the above commit,
> not whether the above commit is effective for its intended purpose.

Thanks. Reported bug 2042242 on rhel9 for the balloon device issue. We can clone to rhel8 if plan to fix on rhel8 too.

Comment 12 John Ferlan 2022-10-21 10:44:46 UTC

Alex - what happened to the patch for this bug and it's RHEL9 clone bug 2034365? I don't see it merged in QEMU and the trail grows cold after Jan-2022.

Is this worth resurrecting for qemu-7.2 w/ backport to RHEL8 of just close the pair. Unclear to me of course whether other machine type changes noted in bug 2042242 may affect outcome.

Comment 13 Alex Williamson 2022-10-21 15:31:20 UTC

(In reply to John Ferlan from comment #12)
> Alex - what happened to the patch for this bug and it's RHEL9 clone bug
> 2034365? I don't see it merged in QEMU and the trail grows cold after
> Jan-2022.
> 
> Is this worth resurrecting for qemu-7.2 w/ backport to RHEL8 of just close
> the pair. Unclear to me of course whether other machine type changes noted
> in bug 2042242 may affect outcome.

There's currently no upstream fix, as outlined in comment 5, the kernel cannot do a device reset due to both an ordering issue of clearing the device reset capabilities, but more importantly the fact that the device lock is held since the device is being actively removed.  Upstream QEMU has rejected any path towards removing the device reset as part of the unplug path, and I self-nak'd the proposal in comment 10 upstream because we rely on some degree of reset failures and fall through to other reset mechanisms to simply make reset failure logging more accurate.

This bug is essentially just a new warning from QEMU, a device reset was not previously attempted on this path and now that a reset is attempted, it cannot be satisfied.  This in no way blocks the functionality of unplugging the device.  This is also not expected to be a typical usage scenario, the host device needs to be in the process of being removed to trigger this, such as via removal of the VF, as originally reported here, or an unplug of a physical device.

Reducing priority, severity, and adding dev cond nak upstream.  I don't have any good ideas for a quick fix here.

Comment 18 Red Hat Bugzilla 2023-09-22 04:25:03 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days