Bug 611379
Description
Qianfeng Zhang
2010-07-05 03:33:27 UTC
Created attachment 429455 [details]
dmesg information which includes the call trace
This file includes the kernel Call Trace
Created attachment 429457 [details]
out put of "lspci -v" after dettaching the VF from the host
The PCI device used for pci-passthrough test is a "Intel Corporation 82599EB 10-Gigabit Network Connection" network interface, the driver is "ixgbe" Please provide the output of 'virsh dumpxml $GUESTNAME' and /var/log/libvirt/qemu/$GUEST.log file Created attachment 430103 [details]
The guest configuration
The guest configuration (or output of #> virsh dumpxml rhel6-g1 )
Created attachment 430104 [details]
log information when starting the guest
Please also pay attention to the dmesg.txt where the kernel call trace is very clear
The logs contain this error message showing a device assignment failure in QMEU, which wasn't reported back to libvirt: Failed to assign irq for "hostdev0": Input/output error Perhaps you are assigning a device that shares an IRQ with another device? Failed to assign irq for "hostdev0": Input/output error Perhaps you are assigning a device that shares an IRQ with another device? If you upgrade to the qemu-kvm RPM version from this BZ you should get proper error reporting for this condition from libvirt https://bugzilla.redhat.com/show_bug.cgi?id=596279 Yes, additionally bz585310 fixed an issue with failure to exit on irq setup which could be contributing: https://bugzilla.redhat.com/show_bug.cgi?id=585310 This was fixed in qemu-kvm-0.12.1.2-2.71.el6. Please retest with the latest bits. We really need some help from the submitter to debug this one. Please update to latest bits and retest. If it still fails, please provide the output of (from the host): lspci -vvv -s 0000:02:10.0 and setpci -s 0000:02:10.0 INTERRUPT_PIN (replace the PCI device above with the virtual function used if different) The only code path that seems like it could cause this error would be trying to setup a host INTx, but since this is a VF, by definition it shouldn't have an INTx, and the INTERRUPT_PIN should return 0. Hi Alex According to my customer, after upgrading to qemu-kvm-0.12.1.2-2.71.el6, the qemu-kvm process can exit very quickly, but the kernel failure and call trace information is still there. I can force the same backtrace if I allow the code to try to register an INTx interrupt for a virtual function device. At this point, it really looks like a hardware issue. Please provide the data requested in comment12. I'd also like to see the output of: sudo xxd /sys/bus/devices/0000\:02\:10.0/config (xxd is part of vim-common) If we can see the interrupt pin is not zero, there's some kind of hardware issue and we need to understand if it's a class problem with this device and we need a workaround, or some kind of point issue or bios defect. A colleague also just pointed out this in the provided dmesg: pci-stub 0000:02:10.0: claimed by stub Machine check events logged ^^^^^^^^^^^^^^^^^^^^^^^^^^^ pci-stub 0000:02:10.0: claimed by stub tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk> device vnet0 entered promiscuous mode br0: port 2(vnet0) entering forwarding state assign device: host bdf = 2:10:0 IRQ handler type mismatch for IRQ 0 current handler: timer Pid: 2014, comm: qemu-kvm Tainted: G M 2.6.32-37.el6.x86_64 #1 Call Trace: [<ffffffff810d8e06>] __setup_irq+0x376/0x3b0 Please get an mcelog from the system so we can figure out if this is related. Another thought is that the MCE listed in comment15 occurs suspiciously between two bindings of the same device to the pci-stub driver. This is caused by the below: (In reply to comment #0) > Steps to reproduce : > #> lspci -v > #> virsh nodedev-list > #> virsh nodedev-dettach pci_0000_02_10_0 This one unbinds the device from ixgbevf and binds it to pci-stub > #> lspci -v // see the attachment for output > #> virsh start <domid> Because the domain xml contains a host device, the start will unbind the device from it's current driver (pci-stub) to the pci-stub driver. This redundancy should not be a problem, but given the location of the MCE, let's try removing it. After recording /sys/bus/pci/device/0000:02:10.0/config and the mcelog, reboot the system to make sure the device is back to a working state, then simply try #> virsh start <domid> without first doing the nodedev-dettach. Created attachment 432374 [details]
dmesg including the kernel call trace again
This also includes the booting message in the kernel
Created attachment 432375 [details]
/proc/interrupts
Created attachment 432376 [details]
lspci -vvv
Created attachment 432377 [details]
lspci -vvv for pci_0000_02_10_0
Created attachment 432380 [details]
lspci -vv output after the failure
Looks the PCI information of the VFs are lost after the failure
Created attachment 432381 [details]
lspci -s 000:02:10.0 INTERRUPT_PIN output
Created attachment 432382 [details]
mce log
yes. this mce error can be got by "#>mcelog" each time just after rebooting the machine. It occurs even we don't test with "pci-passthrough".
I am not sure whether this "mce" has something to do with the failure of device assigning. Can you point me to a testing environment that has the same "ixgbe" interface and can show the success of "pci-passthrought" on RHEL6 Beta 2 ?
Alex Do you think the mce log attached by me is related to the faillure and kernel call trace ? I don't have access to a system with an 82599EB and none are available in beaker. The mce seems to be indicating a memory parity error on a dimm, however since it didn't occur in the latest dmesg, I can't correlate the mce with the card failure. You confirm with the post failure lspci output that the device has gone into a bad state. This explains why we take a very unexpected code path. By the point that happens, we've done very little with the card, it hasn't even been handed over to the guest yet. What happens if you 'modprobe -r ixgbevf' before assigning the device to a guest? Also, with igxbevf loaded, are you able to configure the vf devices in the host and do they work? (note the physical function ethX devices likely need to be up for the virtual functions to receive packets) I think we're either dealing with a hardware problem or possibly the ixgbevf driver isn't cleanly unbinding from the device. Created attachment 433021 [details]
function level reset script
I'm attaching a script that will do the same type of PCI function level reset that happens when a device is assigned to a guest. With the system in a working state and lspci showing valid data for all devices, run this on the virtual function that you're attempting to assign to the guest, ex:
# flr.sh 02:10.0
You'll need to be root to run the script. This should print out the lspci info for the device, preform and FLR reset, then print the resulting state of the device. If this can generate the same type of error state with the device, then I think we can close this as a hardware defect. If not, we probably need access to the system to debug further.
Hi Alex I just collected some information using your script. [root@kvm22 test]# ./flr.sh 02:10.0 Before... 02:10.0 Ethernet controller: Intel Corporation 82559 Ethernet Controller Virtual Function (rev 01) Subsystem: Intel Corporation Device 000c Flags: bus master, fast devsel, latency 0 [virtual] Memory at c0000000 (64-bit, non-prefetchable) [size=16K] [virtual] Memory at c0100000 (64-bit, non-prefetchable) [size=16K] Capabilities: [70] MSI-X: Enable+ Count=3 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: ixgbevf Kernel modules: ixgbevf Device does not support FLR Created attachment 433167 [details]
lspci -vvv
#> modprobe -r igxbevf is executed first. Looks that detaching 02:10.0 will lead all the VFs be claimed by pci-stub
Created attachment 433168 [details]
dmesg including the kernel call trace
dmesg information is almost the same as the previous one
Created attachment 433169 [details]
lspci -vv after the failure
almost the same as previous one
Created attachment 433170 [details]
the guest's XML
As it is
Created attachment 433171 [details]
Qemu log
As it is
Alex If need more information, let me know Created attachment 433241 [details]
flr.sh
New flr.sh that detects 82599 and uses same reset method as linux kernel
(In reply to comment #33) > Alex > If need more information, let me know Sorry, the original flr.sh didn't do what we wanted because the 82599 makes use of a device specific reset. The new version detects this card and emulates the same thing the kernel does on reset. You should see both: Before... <lspci output> and After... <lspci output> Please retest with this new version. Thanks. Well, it looks like the rhel6 kernel doesn't include the 82599 device specific reset that was added to upstream. That could be causing us to do much nastier resets, which could be causing this problem. I've added the necessary patches to a test build, please try it here: https://brewweb.devel.redhat.com/taskinfo?taskID=2612906 Follow the x86_64 and noarch links to get the rpms you need, install, reboot, and let us know if it resolves the problem. Thanks. Created attachment 433701 [details]
#>flr_new.sh 02:10.0 output
Look that the MSI-X capability of the device changed from "Enable+" to "Enable-" .
With the kernel provided by you, the failure is still there, the kernel call trace is the same as the old one Created attachment 433702 [details]
dmesg including kernel call trace
Collected on 2.6.32-51test.
Can we get access to this system? We're not making any progress on debugging this. We've found a system in beaker that can reproduce, no need for access at this point. *** This bug has been marked as a duplicate of bug 617116 *** |