Bug 723535

Summary: pci-stub driver is lost after an attempt to boot a VM with a PCI device assigned to other VM
Product: Red Hat Enterprise Linux 6 Reporter: Eduard Benes <ebenes>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.1CC: ajia, chayang, dallan, ddutile, dyuan, iboverma, juzhang, mkenneth, mvadkert, mzhan, rwu, sgrubb, smueller, tburke, veillard, virt-maint, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:17:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 584498, 846801, 846802    

Description Eduard Benes 2011-07-20 13:51:52 UTC
Description of problem:
Pci-stub device driver disappears after an attempt to boot a guest domain with a PCI device already used by another guest domain. This "breaks" the device for the first virtual guest domain already using it. It is still seen by the guest, but it does not function correctly as dmesg repots issues with the PCI device. Because the link to the pci-stub device driver is lost. The PCI device assigned to the virt guest is a PCIe ethernet card running correctly with enabled VT-d extensions. Assigning to qemu-kvm component for now.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6.x86_64
qemu-kvm-0.12.1.2-2.160.el6.x86_64
kernel-2.6.32-131.0.15.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. Configure system for PCI device assignement by enabling VT-d extensions [1]
2. Add PCI device to a virt guest (eth0 at 0000:03:00.0 in this case)
3. Prepare test1 guest domain with attached PCI device 03:00.0
# virsh nodedev-dumpxml pci_0000_03_00_0
<device>
  <name>pci_0000_03_00_0</name>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>tg3</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>3</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1681'>NetXtreme BCM5761 Gigabit Ethernet PCIe</product>
    <vendor id='0x14e4'>Broadcom Corporation</vendor>
  </capability>
</device>
# virsh start test1
# vmpid=`pgrep qemu-kvm`
# regions=`lspci -vvv -s 03:00.0 | sed -ne 's/Region.*Memory at \([a-f0-9]\{8\}\) .*/-e \1/p'`
# virsh dumpxml test1
... 
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </hostdev>
...

# grep $regions /proc/$vmpid/maps
7f3138138000-7f3138148000 rw-s f7110000 00:00 7138                       /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/resource2
7f313836f000-7f313837f000 rw-s f7100000 00:00 7137                       /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/resource0
# lspci -vvv -s 03:00.0
...
	Capabilities: [160] Device Serial Number 00-10-18-ff-fe-61-35-9a
	Capabilities: [16c] Power Budgeting <?>
	Kernel driver in use: pci-stub
	Kernel modules: tg3

# readlink /sys/bus/pci/devices/0000\:03\:00.0/driver
../../../../bus/pci/drivers/pci-stub

4. Make sure the device works correctly for the test1 guest.

----
So far we are good. Lets try to start test2 guest domain which will attempt to attach the same PCI device used by test1.
---- 

5. Start test2 guest domain which should fail to attach the same PCI device as we used for test1 (expected to fail to start). And check the driver used for the PCI device again.

# virsh dumpxml test2
... 
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </hostdev>
...
# virsh start test2
error: Failed to start domain test2
error: internal error Not detaching active device 0000:03:00.0

# readlink /sys/bus/pci/devices/0000\:03\:00.0/driver
<nothing here, becasue there is no such file>
# ls /sys/bus/pci/devices/0000\:03\:00.0/driver
ls: cannot access /sys/bus/pci/devices/0000:03:00.0/driver: No such file or directory
# lspci -vvv -s 03:00.0
...
	Capabilities: [16c] Power Budgeting <?>
	Kernel modules: tg3
Good news is, that at least the device is still mapped to memory of qemu-kvm process for test1 domain:
# grep $regions /proc/$vmpid/maps 
7ff1d23bd000-7ff1d23cd000 rw-s f7110000 00:00 7138                       /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/resource2
7ff1d37f5000-7ff1d3805000 rw-s f7100000 00:00 7137                       /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/resource0

6. Checking whether the device still works for the test1 guest, shows that it does not :(

7. Another attempt to start test2 guest domain (expected to fail)
# virsh start test2
error: Failed to start domain test2
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/9
Using CPU model "cpu64-rhel6"
Failed to assign device "hostdev0" : Device or resource busy
qemu-kvm: -device pci-assign,host=03:00.0,id=hostdev0,configfd=32,bus=pci.0,addr=0x7: Device 'pci-assign' could not be initialized


Actual results:
The device becomes unusable for the guest using it due to missing driver.

Expected results:
The assigned PCI device should continue working properly for the guest already using it.

Additional info:
Testing this with a live-assignment of the PCI device to test2 is not currently possible due to bug 720972.

Some relevant dmesg output from the host if it helps:
 ...
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
tg3 0000:03:00.0: PME# enabled
tg3 0000:03:00.0: PCI INT A disabled
pci-stub 0000:03:00.0: claimed by stub
device vnet0 entered promiscuous mode
virbr0: topology change detected, propagating
virbr0: port 2(vnet0) entering forwarding state
pci-stub 0000:03:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
pci-stub 0000:03:00.0: restoring config space at offset 0xc (was 0x0, writing 0xf7400000)
pci-stub 0000:03:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
assign device: host bdf = 3:0:0
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: Invalid ROM contents
pci-stub 0000:03:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100402)
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
vnet0: no IPv6 routers present
kvm: 8524: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
pci-stub 0000:03:00.0: irq 51 for MSI/MSI-X
tg3 0000:03:00.0: BAR 0: can't reserve mem region [0xf7100000-0xf710ffff]
tg3 0000:03:00.0: Cannot obtain PCI resources, aborting
tg3: probe of 0000:03:00.0 failed with error -16


----
1. http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization/index.html#chap-Virtualization-PCI_passthrough

Comment 1 Dor Laor 2011-07-25 14:05:13 UTC
Why do you think this is a bug we should fix?
It is exactly like running a new VM that uses the virtual disk of the first and be surprised things break.

Comment 5 Eduard Benes 2011-07-27 15:01:36 UTC
Finally got the required HW, and here are the requested results for 82576.
Tested on fresh RHEL 6.1 with no additional updates. 
Summary: It does not work with 82576, igb driver. Please, let me know if you need more info.
 
[root@hp-dl180g6-01 iommu]# lspci -s 0000:07:00.1
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

[root@hp-dl180g6-01 iommu]# virsh create guest1.xml 
Domain guest1 created from guest1.xml

[root@hp-dl180g6-01 iommu]# pgrep qemu-kvm
11728
[root@hp-dl180g6-01 iommu]# guest1=11728
[root@hp-dl180g6-01 iommu]# readlink /sys/bus/pci/devices/0000:07:00.1/driver
../../../../bus/pci/drivers/pci-stub
[root@hp-dl180g6-01 iommu]# grep -e fbee0000 -e fbec0000 -e fbebc000 /proc/$guest1/maps
7f8268bee000-7f8268bf2000 rw-s fbebc000 00:00 7054                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource3
7f8268bf2000-7f8268c12000 rw-s fbec0000 00:00 7052                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource1
7f8268c12000-7f8268c32000 rw-s fbee0000 00:00 7051                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource0

[root@hp-dl180g6-01 iommu]# virsh create guest2.xml 
error: Failed to create domain from guest2.xml
error: internal error Not detaching active device 0000:07:00.1

[root@hp-dl180g6-01 iommu]# pgrep qemu-kvm
11728
[root@hp-dl180g6-01 iommu]# grep -e fbee0000 -e fbec0000 -e fbebc000 /proc/$guest1/maps
7f8268bee000-7f8268bf2000 rw-s fbebc000 00:00 7054                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource3
7f8268bf2000-7f8268c12000 rw-s fbec0000 00:00 7052                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource1
7f8268c12000-7f8268c32000 rw-s fbee0000 00:00 7051                       /sys/devices/pci0000:00/0000:00:01.0/0000:07:00.1/resource0
[root@hp-dl180g6-01 iommu]# readlink /sys/bus/pci/devices/0000:07:00.1/driver
[root@hp-dl180g6-01 iommu]#
[root@hp-dl180g6-01 iommu]# ls /sys/bus/pci/devices/0000:07:00.1/driver
ls: cannot access /sys/bus/pci/devices/0000:07:00.1/driver: No such file or directory

Comment 7 Stephan Mueller 2011-08-01 07:40:36 UTC
Concerning comment 2: For the CC eval, I think we do not need to have the issue fixed. However, I consider this issue a usability issue that is en par with the partition mapping problem we reported a few months ago. An administrator is allowed to make mistakes. And when the system clearly knows that there is a mistake, the system should ensure that already running VMs are not affected by the error.

Hence, I suggest you fix that issue. Note, I have not looked into the problem closely, but if that issue is located in the pci-stub driver, a rogue VM may trigger that problem too and not only libvirtd.

Comment 9 juzhang 2011-08-10 07:06:27 UTC
Accordind to comment 8,after step4, booted second guest with same device don't affect first vm and device. this issue should be libvirt issue.
Per comment8 and comment3, change component to libvirt.if need kvm qe further testing,please let us now.

Comment 11 dyuan 2011-08-10 09:20:44 UTC
Cann't reproduce this bug with libvirt-0.9.4-2.el6 and qemu-kvm-0.12.1.2-2.177.el6. Will try it with the old version soon.

Get some different result from the bug description.

1. The error report is "reattaching" when start the test2.
# virsh start test2
error: Failed to start domain test2
error: internal error Not **reattaching** active device 0000:03:00.0

2. After start test2 failed, we can read the pci-stub.
# readlink /sys/bus/pci/devices/0000\:03\:00.0/driver
../../../../bus/pci/drivers/pci-stub

3. The pci device is working well in test1.

Comment 12 dyuan 2011-08-10 09:35:48 UTC
Reproduced this bug with libvirt-0.8.7-18.el6 and qemu-kvm-0.12.1.2-2.177.el6. 
Get the same error report and issue with the bug description.

Similar bug 603039 which was verified recently.

Comment 15 Osier Yang 2011-08-17 12:45:36 UTC
commit 53a1db4dfcae87ee42e8f7bbf5f746f0547da9ae
Author: Chris Lalancette <clalance>
Date:   Mon Jun 14 17:12:35 2010 -0400

    Check for active PCI devices when doing nodedevice operations.
    
    In the current libvirt PCI code, there is no checking whether
    a PCI device is in use by a guest when doing node device
    detach or reattach.  This causes problems when a device is
    assigned to a guest, and the administrator starts issuing
    nodedevice commands.  Make it so that we check the list
    of active devices when trying to detach/reattach, and only
    allow the operation if the device is not assigned to a guest.
    
    Signed-off-by: Chris Lalancette <clalance>

Libvirt fixed this one year before. The patch prevent both attaching and nodedev-attach/detach when the device is in use by a guest. 

Move this to POST.

Comment 16 Daniel Veillard 2011-08-19 01:08:07 UTC
Based in comment #11 and following, this is fixed in current builds,
flagging as such,

Daniel

Comment 17 Miroslav Vadkerti 2011-08-19 11:16:15 UTC
I also confirm that I cannot reproduce the problem with the packages below. I used for reproducing the same machine as the reporter.

# rpm -q libvirt qemu-kvm kernel
libvirt-0.9.4-4.el6.x86_64
qemu-kvm-0.12.1.2-2.183.el6.x86_64
kernel-2.6.32-131.6.1.el6.x86_64

When trying to lunch the second virtual machine while the first one running I get expected error:
Error starting domain: internal error Process exited while reading console log output: char device redirected to /dev/pts/0
Failed to assign device "hostdev0" : Device or resource busy
qemu-kvm: -device pci-assign,host=07:00.1,id=hostdev0,configfd=24,bus=pci.0,addr=0x5: Device 'pci-assign' could not be initialized

After this the driver is still available for the first virtual machine:
# readlink /sys/bus/pci/devices/0000\:07\:00.1/driver
../../../../bus/pci/drivers/pci-stub

Comment 19 dyuan 2011-08-22 02:49:05 UTC
Move it to VERIFIED according to comment 17.

Comment 21 errata-xmlrpc 2011-12-06 11:17:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html