Hide Forgot
Description of problem: Repeatedly hot plug/unplug 82599 PF to/from windows 2008_64 guest about 30 times, "assigned_dev_update_msix_mmio: No such device or address" started to pop up, around 300 times, "MSI-X entry number is zero!" popped up, too. In guest, 82599 PF driver works well, but couldn't acquire an IP. The script to repeatedly hot plug/unplug PF: echo "pci_add pci_addr=auto host host=28:00.0" | nc -U /mnt/chayang.unix sleep 10 echo "pci_add pci_addr=auto host host=28:00.1" | nc -U /mnt/chayang.unix sleep 10 echo "pci_del pci_addr=0:5" | nc -U /mnt/chayang.unix sleep 10 echo "pci_del pci_addr=0:6" | nc -U /mnt/chayang.unix sleep 10 Version-Release number of selected component (if applicable): # rpm -q kvm;uname -r kvm-83-246.el5 2.6.18-300.el5 How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Will have a try with rhel5.8 guest and update here.
Same issue happened with RHEL5.8 x86_64 guest under same host environment and caused qemu-kvm aborted. Seems it is relevant to glibc, here is the log: assigned_dev_update_msix_mmio: No such device or address MSI-X entry number is zero! assigned_dev_update_msix_mmio: No such device or address MSI-X entry number is zero! assigned_dev_update_msix_mmio: No such device or address *** glibc detected *** /usr/libexec/qemu-kvm: malloc(): smallbin double linked list corrupted: 0x0000000017656720 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3b1ca73b3c] /lib64/libc.so.6(__libc_malloc+0x6e)[0x3b1ca74e6e] /usr/libexec/qemu-kvm[0x462459] /usr/libexec/qemu-kvm[0x4106fb] /usr/libexec/qemu-kvm[0x4658a2] /usr/libexec/qemu-kvm[0x40eedf] /usr/libexec/qemu-kvm[0x470eac] /usr/libexec/qemu-kvm[0x409452] /usr/libexec/qemu-kvm[0x50139a] /usr/libexec/qemu-kvm[0x40e737] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3b1ca1d9b4] /usr/libexec/qemu-kvm[0x406da9] ======= Memory map: ======== 00400000-005b6000 r-xp 00000000 fd:00 99291105 /usr/libexec/qemu-kvm 007b6000-007bd000 rw-p 001b6000 fd:00 99291105 /usr/libexec/qemu-kvm 007bd000-00bad000 rw-p 007bd000 00:00 0 00dbc000-00dc1000 rw-p 001bc000 fd:00 99291105 /usr/libexec/qemu-kvm 17217000-17d44000 rw-p 17217000 00:00 0 [heap] 40d53000-40d54000 ---p 40d53000 00:00 0 40d54000-41754000 rw-p 40d54000 00:00 0 41a66000-41a67000 ---p 41a66000 00:00 0 41a67000-42467000 rw-p 41a67000 00:00 0 42467000-42468000 ---p 42467000 00:00 0 42468000-42e68000 rw-p 42468000 00:00 0 42e68000-42e69000 ---p 42e68000 00:00 0 42e69000-43869000 rw-p 42e69000 00:00 0 43869000-4386a000 ---p 43869000 00:00 0 4386a000-4426a000 rw-p 4386a000 00:00 0 3b1c600000-3b1c61c000 r-xp 00000000 fd:00 53215541 /lib64/ld-2.5.so 3b1c81c000-3b1c81d000 r--p 0001c000 fd:00 53215541 /lib64/ld-2.5.so 3b1c81d000-3b1c81e000 rw-p 0001d000 fd:00 53215541 /lib64/ld-2.5.so 3b1ca00000-3b1cb4f000 r-xp 00000000 fd:00 53215542 /lib64/libc-2.5.so 3b1cb4f000-3b1cd4e000 ---p 0014f000 fd:00 53215542 /lib64/libc-2.5.so 3b1cd4e000-3b1cd52000 r--p 0014e000 fd:00 53215542 /lib64/libc-2.5.so 3b1cd52000-3b1cd53000 rw-p 00152000 fd:00 53215542 /lib64/libc-2.5.so 3b1cd53000-3b1cd58000 rw-p 3b1cd53000 00:00 0 3b1ce00000-3b1ce82000 r-xp 00000000 fd:00 53215552 /lib64/libm-2.5.so 3b1ce82000-3b1d081000 ---p 00082000 fd:00 53215552 /lib64/libm-2.5.so 3b1d081000-3b1d082000 r--p 00081000 fd:00 53215552 /lib64/libm-2.5.so 3b1d082000-3b1d083000 rw-p 00082000 fd:00 53215552 /lib64/libm-2.5.so 3b1d200000-3b1d214000 r-xp 00000000 fd:00 53215283 /lib64/libz.so.1.2.3 3b1d214000-3b1d413000 ---p 00014000 fd:00 53215283 /lib64/libz.so.1.2.3 3b1d413000-3b1d414000 rw-p 00013000 fd:00 53215283 /lib64/libz.so.1.2.3 3b1d600000-3b1d602000 r-xp 00000000 fd:00 53215543 /lib64/libdl-2.5.so 3b1d602000-3b1d802000 ---p 00002000 fd:00 53215543 /lib64/libdl-2.5.so 3b1d802000-3b1d803000 r--p 00002000 fd:00 53215543 /lib64/libdl-2.5.so 3b1d803000-3b1d804000 rw-p 00003000 fd:00 53215543 /lib64/libdl-2.5.so 3b1da00000-3b1da16000 r-xp 00000000 fd:00 53215547 /lib64/libpthread-2.5.so 3b1da16000-3b1dc15000 ---p 00016000 fd:00 53215547 /lib64/libpthread-2.5.so 3b1dc15000-3b1dc16000 r--p 00015000 fd:00 53215547 /lib64/libpthread-2.5.so 3b1dc16000-3b1dc17000 rw-p 00016000 fd:00 53215547 /lib64/libpthread-2.5.so 3b1dc17000-3b1dc1b000 rw-p 3b1dc17000 00:00 0 3b1de00000-3b1de07000 r-xp 00000000 fd:00 53215548 /lib64/librt-2.5.so 3b1de07000-3b1e007000 ---p 00007000 fd:00 53215548 /lib64/librt-2.5.so 3b1e007000-3b1e008000 r--p 00007000 fd:00 53215548 /lib64/librt-2.5.so 3b1e008000-3b1e009000 rw-p 00008000 fd:00 53215548 /lib64/librt-2.5.so 3b1e200000-3b1e23b000 r-xp 00000000 fd:00 53215553 /lib64/libsepol.so.1 3b1e23b000-3b1e43b000 ---p 0003b000 fd:00 53215553 /lib64/libsepol.so.1 3b1e43b000-3b1e43c000 rw-p 0003b000 fd:00 53215553 /lib64/libsepol.so.1Aborted
(gdb) bt #0 0x0000003b1ca30285 in raise () from /lib64/libc.so.6 #1 0x0000003b1ca31d30 in abort () from /lib64/libc.so.6 #2 0x0000003b1ca6a9db in __libc_message () from /lib64/libc.so.6 #3 0x0000003b1ca72595 in _int_free () from /lib64/libc.so.6 #4 0x0000003b1ca728fb in free () from /lib64/libc.so.6 #5 0x0000000000467d6c in qemu_free_irqs (s=0x1435590) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/irq.c:61 #6 0x000000000041323f in pci_unregister_device (pci_dev=0xf44300) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci.c:366 #7 0x0000000000427518 in free_assigned_device (adev=0x161a4d0) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/device-assignment.c:585 #8 0x00000000004261fd in qemu_pci_hot_deassign_device (pcibus=0, slot=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci-hotplug.c:171 #9 pci_device_hot_remove_success (pcibus=0, slot=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci-hotplug.c:263 #10 0x0000000000500d6e in kvm_outl (opaque=<value optimized out>, addr=5634, data=6) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:698 #11 0x000000000052c559 in handle_io (kvm=0xdc17e0, run=0x2aaaaaab5000, vcpu=0) at libkvm.c:746 #12 0x000000000052cdd9 in kvm_run (kvm=0xdc17e0, vcpu=0, env=0xe4bf30) at libkvm.c:967 #13 0x00000000005014f9 in kvm_cpu_exec (env=0x6) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:206 #14 0x0000000000501783 in kvm_main_loop_cpu (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:402 #15 ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443 #16 0x0000003b1da0677d in start_thread () from /lib64/libpthread.so.0 #17 0x0000003b1cad495d in clone () from /lib64/libc.so.6
There are four shell command in the reproduce script(except sleep), when did the error occur? (after added 28:00.0/28:00.1? after removed 0:5/0:6?) How do you get the bt in comment #2(free)? it seems different from comment #1(malloc). Does this problem exsit in the first hotplug after rebooting the guest? I don't have a 82599 nic, could you help to reproduce this bug? and I can debug remotely.
This bug is already fixed in qemu-kvm upstream, this patch exists in qemu-kvm-rhel6, we need to backport it to kvm-userspace-rhel5. commit 8ea45650455de769872647900964a406876db405 Author: Alex Williamson <alex.williamson> Date: Tue May 19 14:48:56 2009 -0600 Use a bitmap for tracking used GSIs We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit (such as Linux kernels 2.6.21-26). This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson <alex.williamson> Signed-off-by: Avi Kivity <avi>
Created attachment 547884 [details] patch to fix this bz(backport from qemu-kvm-rhel6, tested in localhost)
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Reproduced this bug with the following packages: kernel-2.6.18-308.el5 kvm-83-249.el5 Steps: 0. unbind two PF from host (a 82576 two-ports nic) 1. Boot a guest /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name win -smp 2,cores=2 -k en-us -m 2G -boot c -net nic,vlan=1,macaddr=00:1a:4a:42:49:10,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=/etc/qemu-ifup,downscript=no -drive file=/root/win2008-64-virtio.qcow2,media=disk,if=virtio,cache=off,boot=on,format=qcow2,werror=stop -cpu qemu64,+sse2 -M rhel5.6.0 -notify all -balloon none -vnc :10 -monitor unix:'/tmp/qzhang-monitor',server,nowait 2. repeatedly hot plug and unplug PFs in a loop. # for i in `seq 1 512`; do sh hotplug.sh; echo "the $i time"; done # cat hotplug.sh echo "pci_add pci_addr=auto host host=03:00.0" | nc -U /tmp/qzhang-monitor sleep 10 echo "pci_add pci_addr=auto host host=03:00.1" | nc -U /tmp/qzhang-monitor sleep 10 echo "pci_del pci_addr=0:5" | nc -U /tmp/qzhang-monitor sleep 10 echo "pci_del pci_addr=0:6" | nc -U /tmp/qzhang-monitor sleep 10 Result: in the unfixed kvm-83-249.el5, there's following error displayed after about 40 rounds hotplug&unplug. pcidevice argument parse error; please check the help text for usage Failed to assign device "03:00.0" : File exists Failed to assign device "03:00.1" : File exists assigned_dev_update_msix: deassign irq: No such device assigned_dev_update_msi: deassign irq: No such device Failed to deassign device "03:00.0" : Invalid argument assigned_dev_update_msix: deassign irq: No such device assigned_dev_update_msi: deassign irq: No such device Failed to deassign device "03:00.1" : Invalid argument ============================= Verified with the following packages with the same steps as above: kernel-2.6.18-324.el5 kvm-83-256.el5 Result: The error prompt does not display after 512 rounds hot plug and unplug. Both guest and host work well. So this bug is verified pass.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0007.html