Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 733587 - Reattach a pci device to host which is using by guest sometimes outputs wrong info
Reattach a pci device to host which is using by guest sometimes outputs wrong...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Osier Yang
Virtualization Bugs
:
Depends On:
Blocks: 773650 773651 773677 773696
  Show dependency treegraph
 
Reported: 2011-08-26 02:30 EDT by weizhang
Modified: 2012-06-20 02:30 EDT (History)
9 users (show)

See Also:
Fixed In Version: libvirt-0.9.9-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: If a domain fails to start, the host device(s) for the domain will be reattached to host regardless of whether the device(s) is used by other domain. Consequense: The device will be reattached to host even if it's still being used by other domain. Fix: Improve the underlying codes so that it won't reattach the device which is being used by other domain. Result: More stable hotplug ecosphere
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 02:30:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0748 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2012-06-19 15:31:38 EDT

  None (edit)
Description weizhang 2011-08-26 02:30:46 EDT
Description of problem:
Reattach a pci device to host which is using by guest sometimes outputs success after try to start another guest with assigned pci device

Version-Release number of selected component (if applicable):
kernel-2.6.32-191.el6.x86_64
libvirt-0.9.4-5.el6.x86_64
qemu-kvm-0.12.1.2-2.184.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. on machine with 82576 nic, do
rmmod kvm_intel
rmmod kvm
modprobe kvm allow_unsafe_assigned_interrupts=1
modprobe kvm_intel

2. # lspci |grep -i eth
00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02)

3. install 2 guest and shutdown both, attach xml
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
      </source>
    </hostdev>
on both guests

4. detach pci device from host
# virsh nodedev-dettach pci_0000_00_19_0
Device pci_0000_00_19_0 dettached

5. start 1 guest and then start another
when start second guest, it report error as expected
error: Failed to start domain guest2
error: internal error Not reattaching active device 0000:00:19.0

6. reattach pci device to host
# virsh nodedev-reattach pci_0000_00_19_0
Device pci_0000_00_19_0 re-attached

but check driver
# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver
there is nothing output

also check with
# virsh nodedev-list --tree
# ifconfig -a

the pci device is not back
  
Actual results:
nodedev-reattach reports success but in fact failed

Expected results:
nodedev-reattach reports error like 
error: Failed to re-attach device pci_0000_00_19_0
error: internal error Not reattaching active device 0000:00:19.0

Additional info:
Comment 2 Osier Yang 2011-08-29 23:11:31 EDT
I'd think the device is unbound from the pci-stub driver successfully, however, it fails on reprobing (or even don't do) the driver for the device. Could you check if
"remove_id" is available for pci-stub driver? E.g.

# ls /sys/bus/pci/devices/0000\:00\:19.0/driver/remove_id 

If it exists, please test if the reprobing works fine.

# echo 0000\:00\:19.0 >  /sys/bus/pci/drivers_probe

I guess we have some problem of reprobing the driver for device here.
Comment 3 weizhang 2011-08-29 23:37:04 EDT
(In reply to comment #2)
> I'd think the device is unbound from the pci-stub driver successfully, however,
> it fails on reprobing (or even don't do) the driver for the device. Could you
> check if
> "remove_id" is available for pci-stub driver? E.g.
> 
> # ls /sys/bus/pci/devices/0000\:00\:19.0/driver/remove_id 
> 

after nodedev reattach, there is no remove_id exist
Comment 4 Osier Yang 2011-08-30 02:23:03 EDT
How about before?
Comment 5 weizhang 2011-08-30 03:02:49 EDT
(In reply to comment #4)
> How about before?

before reattach, remove_id exists, and with 
# echo 0000\:00\:19.0 >  /sys/bus/pci/drivers_probe
no error
Comment 6 Alex Jia 2011-08-30 03:27:47 EDT
The following is my debug information, it should be helpful for you:

# virsh nodedev-dettach pci_0000_00_19_0
Device pci_0000_00_19_0 dettached

# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f
/sys/bus/pci/drivers/pci-stub

# virsh start vr-rhel6u1-x86_64-kvm
Domain vr-rhel6u1-x86_64-kvm started

# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f
/sys/bus/pci/drivers/pci-stub

# virsh start vr-rhel6-x86_64-kvm
error: Failed to start domain vr-rhel6-x86_64-kvm
error: internal error Not reattaching active device 0000:00:19.0

# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f
/sys/bus/pci/drivers/pci-stub

# virsh start vr-rhel6-x86_64-kvm
error: Failed to start domain vr-rhel6-x86_64-kvm
error: internal error Process exited while reading console log output: char device redirected to /dev/pts/2
Failed to assign device "hostdev0" : Device or resource busy
qemu-kvm: -device pci-assign,host=00:19.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x7: Device 'pci-assign' could not be initialized

Notes, this error is different from the first time when try to start guest again.


# virsh nodedev-reattach pci_0000_00_19_0
Device pci_0000_00_19_0 re-attached

Notes, the pci device is active, so here should be failed and should see error like starting the second guest. however, it's successful. It seems some variable initial value are changed when the second guest is started, because if I only start a guest then reattach the attached pci device from guest, I can see "...Not reattaching active device..." error.

In addition, dmesg display as follows:
...
e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
e1000e: probe of 0000:00:19.0 failed with error -16
...

Moreover, the messages log catches the same error:

# tail -f /var/log/messages
......
Aug 26 16:54:43 localhost kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
Aug 26 16:54:43 localhost kernel: e1000e: probe of 0000:00:19.0 failed with error -16

Here should be a kernel issue, right?

# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f
/sys/devices/pci0000:00/0000:00:19.0/driver

Notes, the pci driver isn't right.

# ll /sys/devices/pci0000:00/0000:00:19.0
total 0
-rw-r--r--. 1 root root   4096 Aug 26 15:26 broken_parity_status
-r--r--r--. 1 root root   4096 Aug 26 15:07 class
-rw-r--r--. 1 root root    256 Aug 26 15:07 config
-r--r--r--. 1 root root   4096 Aug 26 15:07 device
-rw-------. 1 root root   4096 Aug 26 15:26 enable
-r--r--r--. 1 root root   4096 Aug 26 15:07 irq
-r--r--r--. 1 root root   4096 Aug 26 15:26 local_cpulist
-r--r--r--. 1 root root   4096 Aug 26 15:07 local_cpus
-r--r--r--. 1 root root   4096 Aug 26 15:26 modalias
-rw-r--r--. 1 root root   4096 Aug 26 15:26 msi_bus
-r--r--r--. 1 root root   4096 Aug 26 15:26 numa_node
drwxr-xr-x. 2 root root      0 Aug 26 15:26 power
--w--w----. 1 root root   4096 Aug 26 15:21 remove
--w--w----. 1 root root   4096 Aug 26 15:47 rescan
--w-------. 1 root root   4096 Aug 26 15:07 reset
-r--r--r--. 1 root root   4096 Aug 26 15:07 resource
-rw-------. 1 root root 131072 Aug 26 15:07 resource0
-rw-------. 1 root root   4096 Aug 26 15:07 resource1
-rw-------. 1 root root     32 Aug 26 15:07 resource2
lrwxrwxrwx. 1 root root      0 Aug 26 15:07 subsystem -> ../../../bus/pci
-r--r--r--. 1 root root   4096 Aug 26 15:07 subsystem_device
-r--r--r--. 1 root root   4096 Aug 26 15:07 subsystem_vendor
-rw-r--r--. 1 root root   4096 Aug 26 15:07 uevent
-r--r--r--. 1 root root   4096 Aug 26 15:07 vendor


I try to trace the above issues, the issue may be introduced by the following codes slice:

int
pciReAttachDevice(pciDevice *dev, pciDeviceList *activeDevs)
{
......
    if (activeDevs && pciDeviceListFind(activeDevs, dev)) {
        pciReportError(VIR_ERR_INTERNAL_ERROR,
                       _("Not reattaching active device %s"), dev->name);
        return -1;
    }
......
}

When starting the second guest then reattach the device, pciDeviceListFind will return NULL, it means the pci device isn't active, so reattach will be successful, further more, list->count will be 0 in pciDeviceListFind, the value isn't right, list->count should be 1 not 0, here may be counter a issue, if I have the time, I will debug it again, and hope it's useful for you.


Alex
Comment 7 Osier Yang 2011-09-22 09:31:34 EDT
The problem here the hostdev is not managed. And we don't check if the device is in the active list if it's not managed. So the codes fallthough and steal the device from active pci list.
Comment 8 Osier Yang 2011-09-27 02:23:03 EDT
patch sent to upstream
https://www.redhat.com/archives/libvir-list/2011-September/msg01019.html
Comment 13 yanbing du 2011-10-19 03:41:53 EDT
Test with:
libvirt-0.9.4-18.el6.x86_64
qemu-kvm-0.12.1.2-2.199.el6.x86_64
kernel-2.6.32-211.el6.x86_64

Following the reproduce steps in bug description, bug still not fix.
When reattach the pci device to host which using by a guest: 
# virsh nodedev-reattach pci_0000_00_19_0
Device pci_0000_00_19_0 re-attached

In fact, the pci device didn't come back, and it should report an error that the pci device is in use by a guest, can can't reattach.
Comment 19 Osier Yang 2011-11-29 04:55:34 EST
patch posted to upstream:

https://www.redhat.com/archives/libvir-list/2011-November/msg01590.html
Comment 20 Osier Yang 2011-12-14 21:20:16 EST
Patch committed to upstream.
Comment 21 Daniel Veillard 2012-01-09 03:24:31 EST
Upstream commit 3f29d6c91f56857719fc500f02d55cee72684f36

Daniel
Comment 22 weizhang 2012-01-10 05:39:58 EST
Verify pass on
libvirt-0.9.9-1.el6.x86_64
kernel-2.6.32-225.el6.x86_64
qemu-kvm-0.12.1.2-2.213.el6.x86_64

After starting second guest failed and then reattaching device, it reports error
# virsh nodedev-reattach pci_0000_00_19_0
error: Failed to re-attach device pci_0000_00_19_0
error: internal error Not reattaching active device 0000:00:19.0

and the driver still bound to pci-stub
# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f
/sys/bus/pci/drivers/pci-stub
Comment 23 Osier Yang 2012-05-04 05:27:32 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: If a domain fails to start, the host device(s) for the domain will be reattached to host regardless of whether the device(s) is used by other domain.
Consequense: The device will be reattached to host even if it's still being used by other domain.
Fix: Improve the underlying codes so that it won't reattach the
device which is being used by other domain.
Result: More stable hotplug ecosphere
Comment 25 errata-xmlrpc 2012-06-20 02:30:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html

Note You need to log in before you can comment on or make changes to this bug.