Bug 733587
Summary: | Reattach a pci device to host which is using by guest sometimes outputs wrong info | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | weizhang <weizhan> |
Component: | libvirt | Assignee: | Osier Yang <jyang> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.2 | CC: | ajia, dallan, dyuan, eblake, jyang, mzhan, rwu, veillard, ydu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.9.9-1.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: If a domain fails to start, the host device(s) for the domain will be reattached to host regardless of whether the device(s) is used by other domain.
Consequense: The device will be reattached to host even if it's still being used by other domain.
Fix: Improve the underlying codes so that it won't reattach the
device which is being used by other domain.
Result: More stable hotplug ecosphere
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 06:30:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 773650, 773651, 773677, 773696 |
Description
weizhang
2011-08-26 06:30:46 UTC
I'd think the device is unbound from the pci-stub driver successfully, however, it fails on reprobing (or even don't do) the driver for the device. Could you check if "remove_id" is available for pci-stub driver? E.g. # ls /sys/bus/pci/devices/0000\:00\:19.0/driver/remove_id If it exists, please test if the reprobing works fine. # echo 0000\:00\:19.0 > /sys/bus/pci/drivers_probe I guess we have some problem of reprobing the driver for device here. (In reply to comment #2) > I'd think the device is unbound from the pci-stub driver successfully, however, > it fails on reprobing (or even don't do) the driver for the device. Could you > check if > "remove_id" is available for pci-stub driver? E.g. > > # ls /sys/bus/pci/devices/0000\:00\:19.0/driver/remove_id > after nodedev reattach, there is no remove_id exist How about before? (In reply to comment #4) > How about before? before reattach, remove_id exists, and with # echo 0000\:00\:19.0 > /sys/bus/pci/drivers_probe no error The following is my debug information, it should be helpful for you: # virsh nodedev-dettach pci_0000_00_19_0 Device pci_0000_00_19_0 dettached # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f /sys/bus/pci/drivers/pci-stub # virsh start vr-rhel6u1-x86_64-kvm Domain vr-rhel6u1-x86_64-kvm started # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f /sys/bus/pci/drivers/pci-stub # virsh start vr-rhel6-x86_64-kvm error: Failed to start domain vr-rhel6-x86_64-kvm error: internal error Not reattaching active device 0000:00:19.0 # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f /sys/bus/pci/drivers/pci-stub # virsh start vr-rhel6-x86_64-kvm error: Failed to start domain vr-rhel6-x86_64-kvm error: internal error Process exited while reading console log output: char device redirected to /dev/pts/2 Failed to assign device "hostdev0" : Device or resource busy qemu-kvm: -device pci-assign,host=00:19.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x7: Device 'pci-assign' could not be initialized Notes, this error is different from the first time when try to start guest again. # virsh nodedev-reattach pci_0000_00_19_0 Device pci_0000_00_19_0 re-attached Notes, the pci device is active, so here should be failed and should see error like starting the second guest. however, it's successful. It seems some variable initial value are changed when the second guest is started, because if I only start a guest then reattach the attached pci device from guest, I can see "...Not reattaching active device..." error. In addition, dmesg display as follows: ... e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff] e1000e: probe of 0000:00:19.0 failed with error -16 ... Moreover, the messages log catches the same error: # tail -f /var/log/messages ...... Aug 26 16:54:43 localhost kernel: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff] Aug 26 16:54:43 localhost kernel: e1000e: probe of 0000:00:19.0 failed with error -16 Here should be a kernel issue, right? # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f /sys/devices/pci0000:00/0000:00:19.0/driver Notes, the pci driver isn't right. # ll /sys/devices/pci0000:00/0000:00:19.0 total 0 -rw-r--r--. 1 root root 4096 Aug 26 15:26 broken_parity_status -r--r--r--. 1 root root 4096 Aug 26 15:07 class -rw-r--r--. 1 root root 256 Aug 26 15:07 config -r--r--r--. 1 root root 4096 Aug 26 15:07 device -rw-------. 1 root root 4096 Aug 26 15:26 enable -r--r--r--. 1 root root 4096 Aug 26 15:07 irq -r--r--r--. 1 root root 4096 Aug 26 15:26 local_cpulist -r--r--r--. 1 root root 4096 Aug 26 15:07 local_cpus -r--r--r--. 1 root root 4096 Aug 26 15:26 modalias -rw-r--r--. 1 root root 4096 Aug 26 15:26 msi_bus -r--r--r--. 1 root root 4096 Aug 26 15:26 numa_node drwxr-xr-x. 2 root root 0 Aug 26 15:26 power --w--w----. 1 root root 4096 Aug 26 15:21 remove --w--w----. 1 root root 4096 Aug 26 15:47 rescan --w-------. 1 root root 4096 Aug 26 15:07 reset -r--r--r--. 1 root root 4096 Aug 26 15:07 resource -rw-------. 1 root root 131072 Aug 26 15:07 resource0 -rw-------. 1 root root 4096 Aug 26 15:07 resource1 -rw-------. 1 root root 32 Aug 26 15:07 resource2 lrwxrwxrwx. 1 root root 0 Aug 26 15:07 subsystem -> ../../../bus/pci -r--r--r--. 1 root root 4096 Aug 26 15:07 subsystem_device -r--r--r--. 1 root root 4096 Aug 26 15:07 subsystem_vendor -rw-r--r--. 1 root root 4096 Aug 26 15:07 uevent -r--r--r--. 1 root root 4096 Aug 26 15:07 vendor I try to trace the above issues, the issue may be introduced by the following codes slice: int pciReAttachDevice(pciDevice *dev, pciDeviceList *activeDevs) { ...... if (activeDevs && pciDeviceListFind(activeDevs, dev)) { pciReportError(VIR_ERR_INTERNAL_ERROR, _("Not reattaching active device %s"), dev->name); return -1; } ...... } When starting the second guest then reattach the device, pciDeviceListFind will return NULL, it means the pci device isn't active, so reattach will be successful, further more, list->count will be 0 in pciDeviceListFind, the value isn't right, list->count should be 1 not 0, here may be counter a issue, if I have the time, I will debug it again, and hope it's useful for you. Alex The problem here the hostdev is not managed. And we don't check if the device is in the active list if it's not managed. So the codes fallthough and steal the device from active pci list. patch sent to upstream https://www.redhat.com/archives/libvir-list/2011-September/msg01019.html Test with: libvirt-0.9.4-18.el6.x86_64 qemu-kvm-0.12.1.2-2.199.el6.x86_64 kernel-2.6.32-211.el6.x86_64 Following the reproduce steps in bug description, bug still not fix. When reattach the pci device to host which using by a guest: # virsh nodedev-reattach pci_0000_00_19_0 Device pci_0000_00_19_0 re-attached In fact, the pci device didn't come back, and it should report an error that the pci device is in use by a guest, can can't reattach. patch posted to upstream: https://www.redhat.com/archives/libvir-list/2011-November/msg01590.html Patch committed to upstream. Upstream commit 3f29d6c91f56857719fc500f02d55cee72684f36 Daniel Verify pass on libvirt-0.9.9-1.el6.x86_64 kernel-2.6.32-225.el6.x86_64 qemu-kvm-0.12.1.2-2.213.el6.x86_64 After starting second guest failed and then reattaching device, it reports error # virsh nodedev-reattach pci_0000_00_19_0 error: Failed to re-attach device pci_0000_00_19_0 error: internal error Not reattaching active device 0000:00:19.0 and the driver still bound to pci-stub # readlink /sys/bus/pci/devices/0000\:00\:19.0/driver -f /sys/bus/pci/drivers/pci-stub Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: If a domain fails to start, the host device(s) for the domain will be reattached to host regardless of whether the device(s) is used by other domain. Consequense: The device will be reattached to host even if it's still being used by other domain. Fix: Improve the underlying codes so that it won't reattach the device which is being used by other domain. Result: More stable hotplug ecosphere Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0748.html |