Bug 614811
Summary: | Reattach PCI device which is in use by guest to host will cause host restart | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | dyuan |
Component: | libvirt | Assignee: | Osier Yang <jyang> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | dallan, eblake, jyang, llim, xen-maint, yoyzhang, yuzhang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-06-17 18:10:25 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 667609 | ||
Bug Blocks: |
Description
dyuan
2010-07-15 10:19:54 UTC
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** I'm removing SR-IOV from the summary as this behavior is not specific to SR-IOV devices. Hi dyuan Would you like to see if there are some useful logs in /var/log/messages? - Osier libvirt-0.8.1-27.el6 qemu-kvm-0.12.1.2-2.113.el6 kernel-2.6.32-71.el6 Tested with the normal pci, the host didn't reboot and I'll re-test it with VF. # cat /var/log/messages Dec 13 16:52:23 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfc300000-0xfc31ffff] Dec 13 16:52:23 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with error -16 Tested with VF, the host reboot and no output in /var/log/messages. Before reattach the pci device, need to confirm the pci device is using by guest. ----------------------------------- Additional info for comment 5: Dec 13 18:54:12 dhcp-93-197 kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfc300000-0xfc31ffff] Dec 13 18:54:12 dhcp-93-197 kernel: e1000e: probe of 0000:00:19.0 failed with error -16 Dec 13 18:54:13 dhcp-93-197 kernel: Uhhuh. NMI received for unknow reason b1 on CPU 0. Dec 13 18:54:13 dhcp-93-197 kernel: You have some hardware problem, likely on the PCI bus. Dec 13 18:54:13 dhcp-93-197 kernel: Dazed and confused, but trying to contunue Dec 13 18:54:13 dhcp-93-197 kernel: DRHD: handling fault status reg 3 Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[DMA Write] Request device [00:19.0] fault addr 34b16000 Dec 13 18:54:13 dhcp-93-197 kernel: DMAR:[fault reason 02] Present bit in context entry is clear and the 'driver' disappeared in dir /sys/bus/pci/devices/0000\:00\:19.0/ We can't determine if a PCI device is in use or not from libvirt side. Even when the device is bond to some drivers, e.g. pcistub, it's probly not used actually, for example, 1) attach device to guest, 2) start guest, 3) destroy guest without detaching the device. Then the device is still bond to pcistub driver, but not used actually. And actually, libvirt bind the PCI device to pcistub (for qemu-kvm), and pciback(for xen) when dettaching the device from host. e.g. 1) after dettaching the device from host. [root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l total 0 -rw-r--r--. 1 root root 4096 2011-01-06 02:37 broken_parity_status -r--r--r--. 1 root root 4096 2011-01-06 02:30 class -rw-r--r--. 1 root root 4096 2011-01-06 02:30 config -r--r--r--. 1 root root 4096 2011-01-06 02:30 device lrwxrwxrwx. 1 root root 0 2011-01-06 03:12 driver -> ../../../../bus/pci/drivers/igbvf [root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l total 0 -rw-r--r--. 1 root root 4096 2011-01-06 02:37 broken_parity_status -r--r--r--. 1 root root 4096 2011-01-06 02:30 class -rw-r--r--. 1 root root 4096 2011-01-06 02:30 config -r--r--r--. 1 root root 4096 2011-01-06 02:30 device lrwxrwxrwx. 1 root root 0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub 2) after destroying the guest without dettaching the device [root@dhcp-92-51 ~]# virsh attach-device x86_64 osier/vf.xml Device attached successfully [root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l total 0 -rw-r--r--. 1 root root 4096 2011-01-06 02:37 broken_parity_status -r--r--r--. 1 root root 4096 2011-01-06 02:30 class -rw-r--r--. 1 qemu qemu 4096 2011-01-06 02:30 config -r--r--r--. 1 root root 4096 2011-01-06 02:30 device lrwxrwxrwx. 1 root root 0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub [root@dhcp-92-51 ~]# virsh destroy x86_64 Domain x86_64 destroyed [root@dhcp-92-51 ~]# ls /sys/bus/pci/devices/0000\:03\:10.0/ -l total 0 -rw-r--r--. 1 root root 4096 2011-01-06 02:37 broken_parity_status -r--r--r--. 1 root root 4096 2011-01-06 02:30 class -rw-r--r--. 1 root root 4096 2011-01-06 02:30 config -r--r--r--. 1 root root 4096 2011-01-06 02:30 device lrwxrwxrwx. 1 root root 0 2011-01-06 03:14 driver -> ../../../../bus/pci/drivers/pci-stub So, that means we can't determine if the device is in use or not by check "drivers". And also we can't determine it by "enable", it's just a counter to indicate how many times the device has been enabled, I can't find any other else method then. It's unlike with xen, we could lookup if the PCI device is using by some domain via xenstore (http://www.redhat.com/archives/libvir-list/2011-January/msg00131.html, though this patch is not ready actually, but it demonstrates the method at least), we can't get any useful information from KVM/sysfs, so that we couldn't do anymore from libvirt side to resolve this problem. We need something like "using" in sysfs for PCI device to indicate if the device is in use or not. So, reassign to kernel. bug created against kernel: https://bugzilla.redhat.com/show_bug.cgi?id=667609 |