Bug 516478

Summary: Can not pass-throu MSIX NIC with IRQ shared
Product: Red Hat Enterprise Linux 5 Reporter: Shaohui <shaohui.zheng>
Component: kernelAssignee: Alex Williamson <alex.williamson>
Status: CLOSED WONTFIX QA Contact: Lawrence Lim <llim>
Severity: low Docs Contact:
Priority: low    
Version: 5.4CC: ddutile, donald.d.dugger, haicheng.li, Jes.Sorensen, qzhang, qzhou, tools-bugs, virt-maint, ykaul
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-27 21:18:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580948    

Description Shaohui 2009-08-10 06:28:26 UTC
Description of problem:

This is a kvm pass-throu stress issue, it happens on Nehalem-EP on-board NIC(code name: kawela) after guest continuously create/destroy. the NIC device(bdf 01:00.0) changes its IRQ number, this new IRQ is shared with other device, then we can not assign it again.

[root@localhost root]# qemu-kvm -m 512 -net none -pcidevice host=01:00.0 -hda
./rhel5u4-32e.img
Failed to assign irq for "01:00.0": Invalid argument
Perhaps you are assigning a device that shares an IRQ with another device?
Failed to initialize assigned device host=01:00.0

Dmesg:
PM: Writing back config space on device 0000:01:00.0 at offset 4 (was 0,
writing fbaa0000)
PM: Writing back config space on device 0000:01:00.0 at offset 1 (was 100000,
writing 100400)
assign device: host bdf = 1:0:0
PCI: 0000:01:00.0: Can't enable MSI.  Device already has MSI-X vectors assigned
deassign device: host bdf = 1:0:0
ACPI: PCI interrupt for device 0000:01:00.0 disabled
PCI: Enabling device 0000:01:00.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 177


Version-Release number of selected component (if applicable):
Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):
kernel: 2.6.18-160.el5
Host Kernel Version:rhel5u4-snap5
Hardware:NHM-EP
KVM: kvm-83-101.el5,kvm-qemu-img-83-101.el5


How reproducible:
Each time


Steps to Reproduce:
we need to run several times to reproduce it. We provide a script, it can help you.
  
#/bin/bash
cnt=0
while [ 0 ]
do
        cnt=$(expr $cnt + 1)
        qemu-kvm -m 512  -net none -pcidevice host=00:19.0 -hda
./ia32e_rhel5u3.img &
        sleep 120
        kill -9 $!
done


Actual results:


Expected results:


Additional info:

Comment 9 Alex Williamson 2011-03-02 22:11:41 UTC
Does this bug still exist on RHEL6?

Comment 10 Alex Williamson 2011-07-27 21:18:56 UTC
I think this problem is either fixed in 5.7 (due to bug 657149) or is not reproducible with libvirt.  bz657149 added a reset handler for assigned devices that will use the pci sysfs reset interface.  Assuming the kill -9 allows this to get called, the device should be put into a state where hotplugs can continue.  Barring that, libvirt will also reset devices prior to assigning them to the guest.  This would hopefully clear up any interrupt issue that may be caused by terminating qemu with such prejudice.  While it's a good measure of robustness, I don't expect many actual users kill -9 their guests on a regular basis.  If this is still a problem on RHEL6, please open a new bug there.