Bug 614811

Summary: Reattach PCI device which is in use by guest to host will cause host restart
Product: Red Hat Enterprise Linux 6 Reporter: dyuan
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: dallan, eblake, jyang, llim, xen-maint, yoyzhang, yuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-17 18:10:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 667609    
Bug Blocks:    

Description dyuan 2010-07-15 10:19:54 UTC
Description of problem:

hotplug/passthrough a VF to guest, then reattach the VF on host, host will restart.

Version-Release number of selected component (if applicable):
libvirt-0.8.1-15.el6
kernel-2.6.32-49.el6


How reproducible:
always

Steps to Reproduce:
1. hotplug/passthrough a VF to guest

<hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
       <address bus='0' slot='16' function='0'/>
    </source>
</hostdev>

2. when the guest is running, reattach the VF to host
# virsh nodedev-reattach pci_0000_03_10_0
Device pci_0000_03_10_0 re-attached

  
Actual results:
host restart.

Expected results:
it should pop up a fail warining info, such as 'device is in use'

Additional info:

Comment 2 RHEL Program Management 2010-07-15 14:17:47 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Dave Allan 2010-08-19 19:18:02 UTC
I'm removing SR-IOV from the summary as this behavior is not specific to SR-IOV devices.

Comment 4 Osier Yang 2010-12-08 02:30:02 UTC
Hi dyuan

Would you like to see if there are some useful logs in /var/log/messages?

- Osier

Comment 5 dyuan 2010-12-13 08:55:22 UTC
libvirt-0.8.1-27.el6
qemu-kvm-0.12.1.2-2.113.el6
kernel-2.6.32-71.el6

Tested with the normal pci, the host didn't reboot and I'll re-test it with VF.

# cat /var/log/messages
Dec 13 16:52:23 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfc300000-0xfc31ffff]
Dec 13 16:52:23 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with error -16

Comment 6 dyuan 2010-12-13 10:58:43 UTC
Tested with VF, the host reboot and no output in /var/log/messages.

Before reattach the pci device, need to confirm the pci device is using by guest.

-----------------------------------
Additional info for comment 5:

Dec 13 18:54:12 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve
mem region [0xfc300000-0xfc31ffff]
Dec 13 18:54:12 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with
error -16
Dec 13 18:54:13 dhcp-93-197 kernel: Uhhuh. NMI received for unknow reason b1 on CPU 0.
Dec 13 18:54:13 dhcp-93-197 kernel: You have some hardware problem, likely on the PCI bus.
Dec 13 18:54:13 dhcp-93-197 kernel: Dazed and confused, but trying to contunue
Dec 13 18:54:13 dhcp-93-197 kernel: DRHD: handling fault status reg 3
Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[DMA Write] Request device [00:19.0] fault addr 34b16000
Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[fault reason 02] Present bit in context entry is clear

and the 'driver' disappeared in dir /sys/bus/pci/devices/0000\:00\:19.0/

Comment 7 Osier Yang 2011-01-06 08:27:18 UTC
We can't determine if a PCI device is in use or not from libvirt side.

Even when the device is bond to some drivers, e.g. pcistub, it's probly not used actually, for example, 1) attach device to guest, 2) start guest, 3) destroy guest without detaching the device. Then the device is still bond to pcistub driver, but not used actually. And actually, libvirt bind the PCI device to pcistub (for qemu-kvm), and pciback(for xen) when dettaching the device from host. e.g. 

1) after dettaching the device from host.
[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:12 driver -> ../../../../bus/pci/drivers/igbvf

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub


2) after destroying the guest without dettaching the device

[root@dhcp-92-51 ~]# virsh attach-device x86_64 osier/vf.xml
Device attached successfully

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 qemu qemu  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub

[root@dhcp-92-51 ~]# virsh destroy x86_64
Domain x86_64 destroyed

[root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l
total 0
-rw-r--r--. 1 root root  4096 2011-01-06 02:37 broken_parity_status
-r--r--r--. 1 root root  4096 2011-01-06 02:30 class
-rw-r--r--. 1 root root  4096 2011-01-06 02:30 config
-r--r--r--. 1 root root  4096 2011-01-06 02:30 device
lrwxrwxrwx. 1 root root     0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub

So, that means we can't determine if the device is in use or not by check "drivers". And also we can't determine it by "enable", it's just a counter to indicate how many times the device has been enabled, I can't find any other 
else method then.

It's unlike with xen, we could lookup if the PCI device is using by some domain via xenstore (http://www.redhat.com/archives/libvir-list/2011-January/msg00131.html, though this patch is not ready actually, but it demonstrates the method at least), we can't get any useful information from KVM/sysfs, so that we couldn't do anymore from libvirt side to resolve this problem.

We need something like "using" in sysfs for PCI device to indicate if the device is in use or not.

So, reassign to kernel.

Comment 8 Osier Yang 2011-01-06 08:36:00 UTC
bug created against kernel: https://bugzilla.redhat.com/show_bug.cgi?id=667609