Description of problem: When a pci device is using by guest, try to reattach this device to host, it should detect the pci device is using by guest, then report error warning and does no operation. But in fact, it outputs no error warning, and does some operation which causes the pci device cannot work in guest, and lost in host. Version-Release number of selected component (if applicable): On rhel5.4-server-x86_64-kvm system libvirt-0.6.3-30.el5 libvirt-python-0.6.3-30.el5 kmod-kvm-83-149.el5 etherboot-zroms-kvm-5.4.4-13.el5 kvm-83-149.el5 kvm-qemu-img-83-149.el5 How reproducible: Always Steps to Reproduce: 1. Select a network pci device from host # virsh nodedev-dumpxml pci_8086_10c9_0 <device> <name>pci_8086_10c9_0</name> <parent>pci_8086_340a</parent> <capability type='pci'> <domain>0</domain> <bus>66</bus> <slot>0</slot> <function>0</function> <product id='0x10c9'>82576 Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device> 2. Dettach a network pci device from host # virsh nodedev-dettach pci_8086_10c9_0 Device pci_8086_10c9_0 dettached # virsh nodedev-reset pci_8086_10c9_0 Device pci_8086_10c9_0 reset 3. Add this pci device info into guest xml config file <hostdev mode='subsystem' type='pci'> <source> <address bus='66' slot='0' function='0'/> </source> </hostdev> 4. Run the guest # virsh define rhel5u4_x86_64_kvm.xml Domain rhel5u4_x86_64_kvm defined from rhel5u4_x86_64_kvm.xml # virsh start rhel5u4_x86_64_kvm Domain rhel5u4_x86_64_kvm started The pci device works well in the guest 5. In host, try to reattach the assigned pci device # virsh nodedev-reattach pci_8086_10c9_0 Device pci_8086_10c9_0 re-attached 6. # readlink /sys/bus/pci/devices/0000\:42\:00.0/driver Actual results: After step5, the pci device cannot work in the guest. After step6, null output, is the pci device lost in the host ? Expected results: After step5, it pops up a warning info like 'device is in use' and does no operation After step6, output ../../../../bus/pci/drivers/pci-stub Additional info: After step4, in the host if run the following command firstly # virsh nodedev-reset pci_8086_10c9_0 error: Failed to reset device pci_8086_10c9_0 error: this function is not supported by the hypervisor: Unable to reset PCI device 0000:42:00.0: device is in use # readlink /sys/bus/pci/devices/0000\:42\:00.0/driver ../../../../bus/pci/drivers/pci-stub I think nodedev-reset gives a expected result
In http://libvirt.org/html/libvirt-libvirt.html#virNodeDeviceReAttach, we could find the following descriptions: virNodeDeviceReAttach int virNodeDeviceReAttach (virNodeDevicePtr dev) Re-attach a previously dettached node device to the node so that it may be used by the node again. Depending on the hypervisor, this may involve operations such as resetting the device, unbinding it from a dummy device driver and binding it to its appropriate driver. If the device is currently in use by a guest, this method may fail. dev: pointer to the node device Returns: 0 in case of success, -1 in case of failure. So If the device is currently in use by a guest, this method may fail.
I believe this bug could also be fixed by test packages found at http://people.redhat.com/clalance/bz500217 Could you try to retest it with those packages?
This bug is not fixed by test packages in http://people.redhat.com/clalance/bz500217
Okay, current status is that it's not critical and we don't have a fix yet, so this is being retargeted for Update 6, Daniel
OK, I actually see what is going on here now. What is happening is that nodedev-dettach and nodedev-reattach don't take into account PCI devices that are already assigned to guests. So if you run either of these commands against a device that is assigned to a guest, they will blindly disconnect them. This causes problems and faults in the kernel DMAR code, and essentially causes the device to disappear. I guess if a device is assigned to a guest, *and* that guest is running, these commands should just fail and do nothing. This is still a problem in RHEL-6 and upstream, as well. Chris Lalancette
I've sent a couple of patches upstream to basically disallow nodedev-detach and nodedev-reattach while a device is assigned to a guest. Once they are integrated, I'll do a backport for RHEL-5. Chris Lalancette
Fixed in libvirt-0.8.2-1.el5
Verified with passed on below environment: RHEL5.6-Server-x86_64_KVM kvm-qemu-img-83-205.el5 kernel-2.6.18-228.el5 libvirt-0.8.2-8.el5 But with xen kernel, i file a new bug 646749 to track: When pci works well in guest, at the same time re-attach in the host, the host will directly reboot.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0060.html