Bug 500213 - RHEL5.4 vt-d: libvirt should be able to reset a PCI function even if it causes other unused devices/functions to be reset
RHEL5.4 vt-d: libvirt should be able to reset a PCI function even if it cause...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt (Show other bugs)
5.4
All Linux
high Severity high
: rc
: 5.5
Assigned To: Daniel Veillard
Virtualization Bugs
:
Depends On:
Blocks: 516837 532386 533941
  Show dependency treegraph
 
Reported: 2009-05-11 13:24 EDT by Mark McLoughlin
Modified: 2010-03-30 04:09 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 04:09:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark McLoughlin 2009-05-11 13:24:42 EDT
Clone of a Fedora 11 bug. Bug limits the usefulness of VT-d functionality. No fix implemented as yet, but it will be somewhat invasive.

+++ This bug was initially created as a clone of Bug #499678 +++

Created an attachment (id=342865)
lspci -tv

While testing virt-manager's device assignment with the following steps:

  * Run virt-manager, open an existing guest and go to the details tab
  * Click "Add hardware", choose "Physical host device" and click "Forward"
  * Choose the appropriate device from the drop down list (This is the second LOM):
     "01:00.1 NetXtreme II BCM5716 Gigabit Ethernet"
  * Click "Forward" and "Finish"

and it failed with:

Uncaught error adding device: Could not detach PCI device: this function is not supported by the hypervisor: No PCI reset capability available for 0000:01:00.1

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/addhardware.py", line 571, in finish
    self.add_hostdev()
  File "/usr/share/virt-manager/virtManager/addhardware.py", line 607, in add_hostdev
    self._dev.setup()
  File "/usr/lib/python2.6/site-packages/virtinst/VirtualHostDevice.py", line 208, in setup
    raise RuntimeError(_("Could not detach PCI device: %s" % str(e)))
RuntimeError: Could not detach PCI device: this function is not supported by the hypervisor: No PCI reset capability available for 0000:01:00.1

I will test this with another network card.

--- Additional comment from charles_rose@dell.com on 2009-05-07 12:03:27 EDT ---

I am using python-virtinst-0.400.3-8.fc11

--- Additional comment from markmc@redhat.com on 2009-05-07 12:36:33 EDT ---

charles: could you attach the output of "lspci -vvv" and "lspi -t -v" ? Are there any libvirtd messages output to /var/log/messages at the time?

it looks like this is one function of a multi-function PCI device, and you're hitting this case:

/* Secondary Bus Reset is our sledgehammer - it resets all
 * devices behind a bus.
 */
static int
pciTrySecondaryBusReset(virConnectPtr conn, pciDevice *dev)
{
...
    /* For now, we just refuse to do a secondary bus reset                   
     * if there are other devices/functions behind the bus.
     * In future, we could allow it so long as those devices
     * are not in use by the host or other guests.
     */
    if (pciBusContainsOtherDevices(conn, dev)) {
	VIR_WARN("Other devices on bus with %s, not doing bus reset",
                 dev->name);
        return -1;
    }

or this case:

/* Power management reset attempts to reset a device using a
 * D-state transition from D3hot to D0. Note, in detect_pm_reset()
 * above we require the device supports a full internal reset.
 */
static int
pciTryPowerManagementReset(virConnectPtr conn, pciDevice *dev)
{
...
    /* For now, we just refuse to do a power management reset
     * if there are other functions on this device.
     * In future, we could allow it so long as those functions
     * are not in use by the host or other guests.
     */
    if (pciDeviceContainsOtherFunctions(conn, dev)) {
	VIR_WARN("%s contains other functions, not resetting", dev->name);
        return -1;
    }

We need to improve this code so that it will reset reset a function, even if the reset affects other functions/devices, so long as the device has been detached in the host and is not in use by any other guest

--- Additional comment from markmc@redhat.com on 2009-05-07 12:45:56 EDT ---

clarifying ...

before we can reset:

  1) a function on a multi-function device, using PM reset
  2) a device sharing a bus with other devices, using a bus reset

we must check that the other affected functions/devices:

  a) are detached in the host - i.e. bound to pci-stub
  b) not in use by any other guest
  c) not in use by this guest, in the case of hotplug

--- Additional comment from charles_rose@dell.com on 2009-05-07 12:50:42 EDT ---

Created an attachment (id=342879)
lspci -vvv

--- Additional comment from charles_rose@dell.com on 2009-05-07 13:09:17 EDT ---

Created an attachment (id=342887)
Syslog with virsh nodedev-reset

--- Additional comment from charles_rose@dell.com on 2009-05-07 13:10:57 EDT ---

Created an attachment (id=342889)
Syslog with virt-manager PCI Assignment

--- Additional comment from markmc@redhat.com on 2009-05-07 13:17:19 EDT ---

Right, there we go - libvirt wants to do a Secondary Bus Reset, but it can't because it would also reset the other port on the broadcom NIC:

  warning : Other devices on bus with 0000:01:00.1, not doing bus reset

By the way, even if we fix libvirt as I describe above, you will not be able to use one port in the guest while the other port is in use by the host or another guest

--- Additional comment from berrange@redhat.com on 2009-05-07 13:26:34 EDT ---

Urgh, we really ought to get this kind of message back to the application

The error reported is:

"Could not detach PCI device: this function is not supported by
the hypervisor: No PCI reset capability available for 0000:01:00.1"

It should really have said something more like

"Could not detach PCI device: Other in-use devices on the same bus as device 0000:01:00.1, and FLR is not available"

--- Additional comment from markmc@redhat.com on 2009-05-07 13:41:25 EDT ---

(In reply to comment #8)

> It should really have said something more like
> 
> "Could not detach PCI device: Other in-use devices on the same bus as device
> 0000:01:00.1, and FLR is not available"  

Agreed it should be better, all the possible reasons are:

  1) device doesn't implement FLR
  2) can't use bus reset because it affects other devices
  3) device doesn't implement PM reset
  4) can't use PM reset on multi-function device
  5) something went wrong doing FLR, SBR or PMR

--- Additional comment from paniraja_km@dell.com on 2009-05-08 02:18:23 EDT ---

Additinal info to the issue:

I updated the packages again today. Added a Intel dual port NIC. Tried adding this add-on NIC to a guest. I got the same error as before.


Tried, resetting the NIC from virtual shell, I got the same error. (No PCI reset capability available for 0000:84:00.0)

Am attaching lspci outputs of my config.

--- Additional comment from paniraja_km@dell.com on 2009-05-08 02:20:11 EDT ---

Created an attachment (id=343051)
lspci -t -v output

--- Additional comment from paniraja_km@dell.com on 2009-05-08 02:21:10 EDT ---

Created an attachment (id=343052)
lspci -vvv output
Comment 4 Mark McLoughlin 2009-11-30 10:29:15 EST
the extra args to pciResetDevice() are added by the first patch aren't they?

in addition to the two patches I listed, it might make sense to include these two first:

  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=64a6682b93
  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=ebea341856

they're needed fixes and they should make the backporting more straightforward
Comment 5 Daniel Veillard 2009-12-10 06:00:07 EST
libvirt-0.6.3-24.el5 has been built in dist-5E-qu-candidate with the fixes

Daniel
Comment 7 Alex Jia 2009-12-30 01:28:41 EST
This bug has been verified with libvirt 0.6.3-24.el5 on RHEL-5.5. Already
fixed, set status to VERIFIED. 

Steps to Reproduce:
  * Run virt-manager, open an existing guest and go to the Hardware tab
  * Click "Add hardware", choose "Physical host device" and click "Forward"
  * Choose the appropriate device from the drop down list:
     "00:19.0 Interface eth0 (82566DM-2 Gigabit Network Connection)"
  * Click "Forward" and "Finish"

  and it is successful,the previous error messages can't be raised.

Version-Release number of selected component (if applicable):
[root@dhcp-66-70-62 ~]# uname -a
Linux dhcp-66-70-62.nay.redhat.com 2.6.18-183.el5 #1 SMP Mon Dec 21 18:37:42 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

[root@dhcp-66-70-62 ~]# lsmod|grep kvm
kvm_intel              86664  0 
kvm                   223648  2 ksm,kvm_intel

[root@dhcp-66-70-62 ~]# rpm -qa|grep libvirt
libvirt-python-0.6.3-24.el5
libvirt-0.6.3-24.el5
libvirt-debuginfo-0.6.3-24.el5

[root@dhcp-66-70-62 ~]# rpm -q virt-manager kvm
virt-manager-0.6.1-11.el5
kvm-83-140.el5
Comment 9 zhanghaiyan 2010-01-12 01:04:44 EST
Verified this bug PASS with libvirt-0.6.3-29.el5 on RHEL-5.5-Server-x86_64-xen
Comment 11 errata-xmlrpc 2010-03-30 04:09:15 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0205.html

Note You need to log in before you can comment on or make changes to this bug.