Bug 761350 - "assigned_dev_update_msix_mmio: No such device or address" when repeatedly hot plug/unplug PF
Summary: "assigned_dev_update_msix_mmio: No such device or address" when repeatedly ho...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Amos Kong
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 818024
TreeView+ depends on / blocked
 
Reported: 2011-12-08 06:59 UTC by Chao Yang
Modified: 2015-09-28 02:21 UTC (History)
12 users (show)

Fixed In Version: kvm-83-254.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 818024 (view as bug list)
Environment:
Last Closed: 2013-01-08 06:59:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
patch to fix this bz(backport from qemu-kvm-rhel6, tested in localhost) (5.10 KB, patch)
2011-12-16 17:32 UTC, Amos Kong
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0007 0 normal SHIPPED_LIVE kvm bug fix update 2013-01-07 15:30:17 UTC

Description Chao Yang 2011-12-08 06:59:14 UTC
Description of problem:
Repeatedly hot plug/unplug 82599 PF to/from windows 2008_64 guest about 30 times,
"assigned_dev_update_msix_mmio: No such device or address" started to pop up, around 300 times, "MSI-X entry number is zero!" popped up, too. In guest, 82599 PF driver works well, but couldn't acquire an IP.

The script to repeatedly hot plug/unplug PF:
echo "pci_add pci_addr=auto host host=28:00.0" | nc -U /mnt/chayang.unix
sleep 10
echo "pci_add pci_addr=auto host host=28:00.1" | nc -U /mnt/chayang.unix
sleep 10
echo "pci_del pci_addr=0:5" | nc -U /mnt/chayang.unix
sleep 10
echo "pci_del pci_addr=0:6" | nc -U /mnt/chayang.unix
sleep 10

Version-Release number of selected component (if applicable):
# rpm -q kvm;uname -r
kvm-83-246.el5
2.6.18-300.el5

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Will have a try with rhel5.8 guest and update here.

Comment 1 Chao Yang 2011-12-09 02:38:01 UTC
Same issue happened with RHEL5.8 x86_64 guest under same host environment and caused qemu-kvm aborted. Seems it is relevant to glibc, here is the log:

assigned_dev_update_msix_mmio: No such device or address
MSI-X entry number is zero!
assigned_dev_update_msix_mmio: No such device or address
MSI-X entry number is zero!
assigned_dev_update_msix_mmio: No such device or address
*** glibc detected *** /usr/libexec/qemu-kvm: malloc(): smallbin double linked list corrupted: 0x0000000017656720 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3b1ca73b3c]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x3b1ca74e6e]
/usr/libexec/qemu-kvm[0x462459]
/usr/libexec/qemu-kvm[0x4106fb]
/usr/libexec/qemu-kvm[0x4658a2]
/usr/libexec/qemu-kvm[0x40eedf]
/usr/libexec/qemu-kvm[0x470eac]
/usr/libexec/qemu-kvm[0x409452]
/usr/libexec/qemu-kvm[0x50139a]
/usr/libexec/qemu-kvm[0x40e737]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3b1ca1d9b4]
/usr/libexec/qemu-kvm[0x406da9]
======= Memory map: ========
00400000-005b6000 r-xp 00000000 fd:00 99291105                           /usr/libexec/qemu-kvm
007b6000-007bd000 rw-p 001b6000 fd:00 99291105                           /usr/libexec/qemu-kvm
007bd000-00bad000 rw-p 007bd000 00:00 0 
00dbc000-00dc1000 rw-p 001bc000 fd:00 99291105                           /usr/libexec/qemu-kvm
17217000-17d44000 rw-p 17217000 00:00 0                                  [heap]
40d53000-40d54000 ---p 40d53000 00:00 0 
40d54000-41754000 rw-p 40d54000 00:00 0 
41a66000-41a67000 ---p 41a66000 00:00 0 
41a67000-42467000 rw-p 41a67000 00:00 0 
42467000-42468000 ---p 42467000 00:00 0 
42468000-42e68000 rw-p 42468000 00:00 0 
42e68000-42e69000 ---p 42e68000 00:00 0 
42e69000-43869000 rw-p 42e69000 00:00 0 
43869000-4386a000 ---p 43869000 00:00 0 
4386a000-4426a000 rw-p 4386a000 00:00 0 
3b1c600000-3b1c61c000 r-xp 00000000 fd:00 53215541                       /lib64/ld-2.5.so
3b1c81c000-3b1c81d000 r--p 0001c000 fd:00 53215541                       /lib64/ld-2.5.so
3b1c81d000-3b1c81e000 rw-p 0001d000 fd:00 53215541                       /lib64/ld-2.5.so
3b1ca00000-3b1cb4f000 r-xp 00000000 fd:00 53215542                       /lib64/libc-2.5.so
3b1cb4f000-3b1cd4e000 ---p 0014f000 fd:00 53215542                       /lib64/libc-2.5.so
3b1cd4e000-3b1cd52000 r--p 0014e000 fd:00 53215542                       /lib64/libc-2.5.so
3b1cd52000-3b1cd53000 rw-p 00152000 fd:00 53215542                       /lib64/libc-2.5.so
3b1cd53000-3b1cd58000 rw-p 3b1cd53000 00:00 0 
3b1ce00000-3b1ce82000 r-xp 00000000 fd:00 53215552                       /lib64/libm-2.5.so
3b1ce82000-3b1d081000 ---p 00082000 fd:00 53215552                       /lib64/libm-2.5.so
3b1d081000-3b1d082000 r--p 00081000 fd:00 53215552                       /lib64/libm-2.5.so
3b1d082000-3b1d083000 rw-p 00082000 fd:00 53215552                       /lib64/libm-2.5.so
3b1d200000-3b1d214000 r-xp 00000000 fd:00 53215283                       /lib64/libz.so.1.2.3
3b1d214000-3b1d413000 ---p 00014000 fd:00 53215283                       /lib64/libz.so.1.2.3
3b1d413000-3b1d414000 rw-p 00013000 fd:00 53215283                       /lib64/libz.so.1.2.3
3b1d600000-3b1d602000 r-xp 00000000 fd:00 53215543                       /lib64/libdl-2.5.so
3b1d602000-3b1d802000 ---p 00002000 fd:00 53215543                       /lib64/libdl-2.5.so
3b1d802000-3b1d803000 r--p 00002000 fd:00 53215543                       /lib64/libdl-2.5.so
3b1d803000-3b1d804000 rw-p 00003000 fd:00 53215543                       /lib64/libdl-2.5.so
3b1da00000-3b1da16000 r-xp 00000000 fd:00 53215547                       /lib64/libpthread-2.5.so
3b1da16000-3b1dc15000 ---p 00016000 fd:00 53215547                       /lib64/libpthread-2.5.so
3b1dc15000-3b1dc16000 r--p 00015000 fd:00 53215547                       /lib64/libpthread-2.5.so
3b1dc16000-3b1dc17000 rw-p 00016000 fd:00 53215547                       /lib64/libpthread-2.5.so
3b1dc17000-3b1dc1b000 rw-p 3b1dc17000 00:00 0 
3b1de00000-3b1de07000 r-xp 00000000 fd:00 53215548                       /lib64/librt-2.5.so
3b1de07000-3b1e007000 ---p 00007000 fd:00 53215548                       /lib64/librt-2.5.so
3b1e007000-3b1e008000 r--p 00007000 fd:00 53215548                       /lib64/librt-2.5.so
3b1e008000-3b1e009000 rw-p 00008000 fd:00 53215548                       /lib64/librt-2.5.so
3b1e200000-3b1e23b000 r-xp 00000000 fd:00 53215553                       /lib64/libsepol.so.1
3b1e23b000-3b1e43b000 ---p 0003b000 fd:00 53215553                       /lib64/libsepol.so.1
3b1e43b000-3b1e43c000 rw-p 0003b000 fd:00 53215553                       /lib64/libsepol.so.1Aborted

Comment 2 Chao Yang 2011-12-12 02:26:38 UTC
(gdb) bt
#0  0x0000003b1ca30285 in raise () from /lib64/libc.so.6
#1  0x0000003b1ca31d30 in abort () from /lib64/libc.so.6
#2  0x0000003b1ca6a9db in __libc_message () from /lib64/libc.so.6
#3  0x0000003b1ca72595 in _int_free () from /lib64/libc.so.6
#4  0x0000003b1ca728fb in free () from /lib64/libc.so.6
#5  0x0000000000467d6c in qemu_free_irqs (s=0x1435590) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/irq.c:61
#6  0x000000000041323f in pci_unregister_device (pci_dev=0xf44300) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci.c:366
#7  0x0000000000427518 in free_assigned_device (adev=0x161a4d0) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/device-assignment.c:585
#8  0x00000000004261fd in qemu_pci_hot_deassign_device (pcibus=0, slot=<value optimized out>)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci-hotplug.c:171
#9  pci_device_hot_remove_success (pcibus=0, slot=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/hw/pci-hotplug.c:263
#10 0x0000000000500d6e in kvm_outl (opaque=<value optimized out>, addr=5634, data=6)
    at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:698
#11 0x000000000052c559 in handle_io (kvm=0xdc17e0, run=0x2aaaaaab5000, vcpu=0) at libkvm.c:746
#12 0x000000000052cdd9 in kvm_run (kvm=0xdc17e0, vcpu=0, env=0xe4bf30) at libkvm.c:967
#13 0x00000000005014f9 in kvm_cpu_exec (env=0x6) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:206
#14 0x0000000000501783 in kvm_main_loop_cpu (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:402
#15 ap_main_loop (_env=<value optimized out>) at /usr/src/debug/kvm-83-maint-snapshot-20090205/qemu/qemu-kvm.c:443
#16 0x0000003b1da0677d in start_thread () from /lib64/libpthread.so.0
#17 0x0000003b1cad495d in clone () from /lib64/libc.so.6

Comment 3 Amos Kong 2011-12-15 01:42:23 UTC
There are four shell command in the reproduce script(except sleep), when did the error occur?  (after added 28:00.0/28:00.1? after removed 0:5/0:6?)

How do you get the bt in comment #2(free)? it seems different from comment #1(malloc).
Does this problem exsit in the first hotplug after rebooting the guest?

I don't have a 82599 nic, could you help to reproduce this bug? and I can debug remotely.

Comment 4 Amos Kong 2011-12-16 17:26:45 UTC
This bug is already fixed in qemu-kvm upstream, this patch exists in qemu-kvm-rhel6, we need to backport it to kvm-userspace-rhel5.

commit 8ea45650455de769872647900964a406876db405
Author: Alex Williamson <alex.williamson>
Date:   Tue May 19 14:48:56 2009 -0600

    Use a bitmap for tracking used GSIs
    
    We're currently using a counter to track the most recent GSI we've
    handed out.  This quickly hits KVM_MAX_IRQ_ROUTES when using device
    assignment with a driver that regularly toggles the MSI enable bit
    (such as Linux kernels 2.6.21-26).  This can mean only a few minutes
    of usable run time.  Instead, track used GSIs in a bitmap.
    
    Signed-off-by: Alex Williamson <alex.williamson>
    Signed-off-by: Avi Kivity <avi>

Comment 5 Amos Kong 2011-12-16 17:32:04 UTC
Created attachment 547884 [details]
patch to fix this bz(backport from qemu-kvm-rhel6, tested in localhost)

Comment 11 RHEL Program Management 2012-04-02 10:19:06 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 18 Qunfang Zhang 2012-07-12 04:52:04 UTC
Reproduced this bug with the following packages:
kernel-2.6.18-308.el5
kvm-83-249.el5

Steps:
0. unbind two PF from host (a 82576 two-ports nic)

1. Boot a guest
/usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name win -smp 2,cores=2 -k en-us -m 2G -boot c -net nic,vlan=1,macaddr=00:1a:4a:42:49:10,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=/etc/qemu-ifup,downscript=no -drive file=/root/win2008-64-virtio.qcow2,media=disk,if=virtio,cache=off,boot=on,format=qcow2,werror=stop -cpu qemu64,+sse2 -M rhel5.6.0 -notify all -balloon none -vnc :10 -monitor unix:'/tmp/qzhang-monitor',server,nowait

2. repeatedly hot plug and unplug PFs in a loop.
# for i in `seq 1 512`; do sh hotplug.sh; echo "the $i time"; done
# cat hotplug.sh 
echo "pci_add pci_addr=auto host host=03:00.0" | nc -U /tmp/qzhang-monitor
sleep 10
echo "pci_add pci_addr=auto host host=03:00.1" | nc -U /tmp/qzhang-monitor
sleep 10
echo "pci_del pci_addr=0:5" | nc -U /tmp/qzhang-monitor
sleep 10
echo "pci_del pci_addr=0:6" | nc -U /tmp/qzhang-monitor
sleep 10

Result: in the unfixed kvm-83-249.el5, there's following error displayed after about 40 rounds hotplug&unplug.

pcidevice argument parse error; please check the help text for usage
Failed to assign device "03:00.0" : File exists
Failed to assign device "03:00.1" : File exists
assigned_dev_update_msix: deassign irq: No such device
assigned_dev_update_msi: deassign irq: No such device
Failed to deassign device "03:00.0" : Invalid argument
assigned_dev_update_msix: deassign irq: No such device
assigned_dev_update_msi: deassign irq: No such device
Failed to deassign device "03:00.1" : Invalid argument

=============================

Verified with the following packages with the same steps as above:
kernel-2.6.18-324.el5
kvm-83-256.el5

Result: The error prompt does not display after 512 rounds hot plug and unplug. Both guest and host work well. 

So this bug is verified pass.

Comment 20 errata-xmlrpc 2013-01-08 06:59:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0007.html


Note You need to log in before you can comment on or make changes to this bug.