Description of problem: This is a kvm pass-throu stress issue, it happens on Nehalem-EP on-board NIC(code name: kawela) after guest continuously create/destroy. the NIC device(bdf 01:00.0) changes its IRQ number, this new IRQ is shared with other device, then we can not assign it again. [root@localhost root]# qemu-kvm -m 512 -net none -pcidevice host=01:00.0 -hda ./rhel5u4-32e.img Failed to assign irq for "01:00.0": Invalid argument Perhaps you are assigning a device that shares an IRQ with another device? Failed to initialize assigned device host=01:00.0 Dmesg: PM: Writing back config space on device 0000:01:00.0 at offset 4 (was 0, writing fbaa0000) PM: Writing back config space on device 0000:01:00.0 at offset 1 (was 100000, writing 100400) assign device: host bdf = 1:0:0 PCI: 0000:01:00.0: Can't enable MSI. Device already has MSI-X vectors assigned deassign device: host bdf = 1:0:0 ACPI: PCI interrupt for device 0000:01:00.0 disabled PCI: Enabling device 0000:01:00.0 (0000 -> 0003) ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 177 Version-Release number of selected component (if applicable): Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows): kernel: 2.6.18-160.el5 Host Kernel Version:rhel5u4-snap5 Hardware:NHM-EP KVM: kvm-83-101.el5,kvm-qemu-img-83-101.el5 How reproducible: Each time Steps to Reproduce: we need to run several times to reproduce it. We provide a script, it can help you. #/bin/bash cnt=0 while [ 0 ] do cnt=$(expr $cnt + 1) qemu-kvm -m 512 -net none -pcidevice host=00:19.0 -hda ./ia32e_rhel5u3.img & sleep 120 kill -9 $! done Actual results: Expected results: Additional info:
Does this bug still exist on RHEL6?
I think this problem is either fixed in 5.7 (due to bug 657149) or is not reproducible with libvirt. bz657149 added a reset handler for assigned devices that will use the pci sysfs reset interface. Assuming the kill -9 allows this to get called, the device should be put into a state where hotplugs can continue. Barring that, libvirt will also reset devices prior to assigning them to the guest. This would hopefully clear up any interrupt issue that may be caused by terminating qemu with such prejudice. While it's a good measure of robustness, I don't expect many actual users kill -9 their guests on a regular basis. If this is still a problem on RHEL6, please open a new bug there.