This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1029743 - qemu-kvm core dump after hot plug/unplug 82576 PF about 100 times
qemu-kvm core dump after hot plug/unplug 82576 PF about 100 times
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: Alex Williamson
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-13 00:24 EST by Xu Han
Modified: 2014-06-17 23:42 EDT (History)
11 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.3-22.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:11:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Xu Han 2013-11-13 00:24:25 EST
Description of problem:
Test three times.
The first time occurred after 115th hot plug.
The second time occurred after 112th hot plug.
The third time occurred after 98th hot plug.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-18.el7.x86_64
kernel-3.10.0-48.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. boot guest.
# /usr/libexec/qemu-kvm -nodefaults -M pc -m 4G -cpu Opteron_G3 -smp 4,cores=2,threads=2,sockets=1 -boot menu=on -monitor stdio -rtc base=localtime,clock=host,driftfix=slew -qmp tcp:0:5555,server,nowait -vga qxl -drive file=/home/win2008r2.qcow2_v3,format=qcow2,id=guest-img,if=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=guest-img,id=os-disk -spice disable-ticketing,port=5931 -device virtio-balloon,id=balloon -netdev tap,id=hostnet0,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00 -monitor unix:/tmp/guest-sock,server,nowait

2. do hot plug/unplug PF in a loop via hmp
# cat start.sh
for i in $(seq 500)
do
    echo "device_add vfio-pci,host=01:00.0,id=pf0" | nc -U /tmp/guest-sock
    sleep 10
    echo "device_del pf0" | nc -U /tmp/guest-sock
    sleep 10
done

Actual results:
after , qemu-kvm core dump.
qemu-kvm: vfio: Error: event_notifier_init failed
qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/kvm-all.c:967: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

Expected results:
can do repeatedly hot plug/unplug with no error.

Additional info:
(gdb) bt
#0  0x00007ffff30db999 in raise () from /lib64/libc.so.6
#1  0x00007ffff30dd0a8 in abort () from /lib64/libc.so.6
#2  0x00007ffff30d4906 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff30d49b2 in __assert_fail () from /lib64/libc.so.6
#4  0x00005555557c0c20 in kvm_irqchip_commit_routes (s=<optimized out>)
    at /usr/src/debug/qemu-1.5.3/kvm-all.c:967
#5  0x00005555557c0ee1 in kvm_irqchip_add_msi_route (s=0x555556502460, msg=...)
    at /usr/src/debug/qemu-1.5.3/kvm-all.c:1178
#6  0x00005555557ac5e9 in vfio_msix_vector_do_use (pdev=0x555556755c00, nr=5, 
    msg=msg@entry=0x7fffeab06a00, 
    handler=handler@entry=0x5555557ab0b0 <vfio_msi_interrupt>)
    at /usr/src/debug/qemu-1.5.3/hw/misc/vfio.c:660
#7  0x00005555557ac78e in vfio_msix_vector_use (pdev=<optimized out>, 
    nr=<optimized out>, msg=...)
    at /usr/src/debug/qemu-1.5.3/hw/misc/vfio.c:715
#8  0x00005555556ba27a in msix_set_notifier_for_vector (vector=5, 
    dev=0x555556755c00) at hw/pci/msix.c:508
#9  msix_set_vector_notifiers (dev=dev@entry=0x555556755c00, 
    use_notifier=use_notifier@entry=0x5555557ac760 <vfio_msix_vector_use>, 
    release_notifier=release_notifier@entry=0x5555557ab350 <vfio_msix_vector_release>, poll_notifier=poll_notifier@entry=0x0) at hw/pci/msix.c:535
#10 0x00005555557aff91 in vfio_enable_msix (vdev=0x555556755c00)
    at /usr/src/debug/qemu-1.5.3/hw/misc/vfio.c:792
#11 vfio_pci_write_config (pdev=0x555556755c00, addr=114, val=32777, len=2)
    at /usr/src/debug/qemu-1.5.3/hw/misc/vfio.c:2022
#12 0x00005555557c4142 in access_with_adjusted_size (addr=addr@entry=2, 
    value=value@entry=0x7fffeab06b58, size=2, access_size_min=<optimized out>, 
    access_size_max=<optimized out>, 
    access=access@entry=0x5555557c4700 <memory_region_write_accessor>, 
    opaque=opaque@entry=0x5555566b6888)
    at /usr/src/debug/qemu-1.5.3/memory.c:364
#13 0x00005555557c5617 in memory_region_iorange_write (
    iorange=<optimized out>, offset=2, width=2, data=32777)
    at /usr/src/debug/qemu-1.5.3/memory.c:439
#14 0x00005555557c2ef5 in kvm_handle_io (count=1, size=2, direction=1, 
    data=<optimized out>, port=3326)
    at /usr/src/debug/qemu-1.5.3/kvm-all.c:1500
#15 kvm_cpu_exec (env=env@entry=0x55555669a250)
    at /usr/src/debug/qemu-1.5.3/kvm-all.c:1649
#16 0x000055555576e6f5 in qemu_kvm_cpu_thread_fn (arg=0x55555669a250)
    at /usr/src/debug/qemu-1.5.3/cpus.c:793
#17 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff319c1ad in clone () from /lib64/libc.so.6
Comment 1 Xu Han 2013-11-13 00:27:31 EST
Found call trace in host dmesg.

[11150.023832] ------------[ cut here ]------------
[11150.023843] WARNING: at net/core/dev.c:5018 rollback_registered_many+0x1e2/0x210()
[11150.023845] Modules linked in: tun igbvf vfio_pci vfio_iommu_type1 vfio xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables openvswitch vxlan ip_tunnel gre sg ipmi_devintf dcdbas kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw pcspkr amd64_edac_mod edac_mce_amd fam15h_power k10temp edac_core sp5100_tco i2c_piix4 igb
[11150.023888]  ipmi_si ptp bnx2 pps_core ipmi_msghandler dca shpchp acpi_power_meter mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_common sr_mod cdrom ata_generic pata_acpi mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm ahci drm mptsas pata_atiixp libahci scsi_transport_sas crc32c_intel mptscsih libata i2c_core mptbase dm_mirror dm_region_hash dm_log dm_mod
[11150.023918] CPU: 1 PID: 4947 Comm: qemu-kvm Not tainted 3.10.0-48.el7.x86_64 #1
[11150.023920] Hardware name: Dell Inc. PowerEdge R415/0YFWR8, BIOS 1.8.6 12/06/2011
[11150.023921]  0000000000000009 ffff8807616d3b38 ffffffff815abfff ffff8807616d3b70
[11150.023926]  ffffffff810589d1 ffff8808226b4000 ffff8807616d3bb8 ffff8807616d3bb8
[11150.023930]  ffff8808287006c0 0000000000000001 ffff8807616d3b80 ffffffff81058aaa
[11150.023933] Call Trace:
[11150.023940]  [<ffffffff815abfff>] dump_stack+0x19/0x1b
[11150.023946]  [<ffffffff810589d1>] warn_slowpath_common+0x61/0x80
[11150.023949]  [<ffffffff81058aaa>] warn_slowpath_null+0x1a/0x20
[11150.023951]  [<ffffffff8149eaf2>] rollback_registered_many+0x1e2/0x210
[11150.023954]  [<ffffffff8149eb51>] rollback_registered+0x31/0x40
[11150.023956]  [<ffffffff8149fc08>] unregister_netdevice_queue+0x48/0x90
[11150.023961]  [<ffffffffa065a31e>] __tun_detach+0x10e/0x2a0 [tun]
[11150.023964]  [<ffffffffa065a4dd>] tun_chr_close+0x2d/0x50 [tun]
[11150.023968]  [<ffffffff8119f3a9>] __fput+0xe9/0x270
[11150.023971]  [<ffffffff8119f65e>] ____fput+0xe/0x10
[11150.023987]  [<ffffffff8107b544>] task_work_run+0xc4/0xe0
[11150.024004]  [<ffffffff8105e68b>] do_exit+0x2bb/0xa30
[11150.024017]  [<ffffffff8105ee7f>] do_group_exit+0x3f/0xa0
[11150.024028]  [<ffffffff8106dab0>] get_signal_to_deliver+0x1d0/0x6e0
[11150.024033]  [<ffffffff81012408>] do_signal+0x48/0x5e0
[11150.024037]  [<ffffffff81012a11>] do_notify_resume+0x71/0xc0
[11150.024040]  [<ffffffff815bba52>] int_signal+0x12/0x17
[11150.024043] ---[ end trace c68788361608cdef ]---
Comment 3 Xu Han 2013-11-13 00:45:38 EST
Call trace in Comment 1 caused by openvswitch.
Comment 4 Xu Han 2013-11-13 20:56:27 EST
Test with 82576 VF.

Result:
After 333th hot plug, qemu-kvm core dump.
Comment 5 Alex Williamson 2013-11-13 21:38:24 EST
I don't think comment 1 is related to the problem described in comment 0.  Have you seen comment 1 consistently in relation to the core dump?  My guess is that we're leaking irq routes somewhere in qemu-kvm and after enough hotplugs we ask for too many, kvm says no, and due to the spectacular error handling in qemu-kvm, we give up with an assert.  Please confirm whether or not the host dmesg in comment 1 is repeatable in relation to the original problem.
Comment 6 Xu Han 2013-11-13 22:02:51 EST
Sorry, my mistake. As what I said in Comment 3, this call trace actually caused by some error of openvswitch, and was not related to this issue. So please ignore it.
Comment 7 Alex Williamson 2013-11-13 22:13:02 EST
Sorry, I skimmed right over comment 3.  Thanks.
Comment 8 Alex Williamson 2013-11-20 16:00:20 EST
Can QE provide access to a system that produces this?  I'm unable to reproduce in my environment.
Comment 12 Alex Williamson 2013-11-22 14:35:19 EST
Patch posted upstream:

http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg02767.html
Comment 17 mazhang 2013-12-25 01:03:15 EST
Reproduced this bug.

Host:
qemu-kvm-rhev-1.5.3-18.el7.x86_64
kernel-3.10.0-61.el7.x86_64

Guest:
win2008r2-64

Cli:
gdb --args /usr/libexec/qemu-kvm -nodefaults -M pc -m 4G -cpu Opteron_G1 -smp 4,cores=2,threads=2,sockets=1 -boot menu=on -monitor stdio -rtc base=localtime,clock=host,driftfix=slew -qmp tcp:0:5555,server,nowait -vga qxl -drive file=/home/win2008r2-64.qcow2,format=qcow2,id=guest-img,if=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=guest-img,id=os-disk -spice disable-ticketing,port=5900 -device virtio-balloon,id=balloon -monitor unix:/tmp/guest-sock,server,nowait

Steps:
Start qemu-kvm, do hotplug/hotunplug pf.
[root@dell-per415-03 ~]# cat hotplug.sh 
#!/bin/bash
for i in $(seq 500)
do
    echo "device_add vfio-pci,host=01:00.0,id=pf0" | nc -U /tmp/guest-sock
    sleep 1
    echo "device_del pf0" | nc -U /tmp/guest-sock
    sleep 5
done


Result:
Qemu-kvm will aborted.

qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/kvm-all.c:967: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffeab07700 (LWP 17852)]
0x00007ffff30db979 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff30db979 in raise () from /lib64/libc.so.6
#1  0x00007ffff30dd088 in abort () from /lib64/libc.so.6
#2  0x00007ffff30d48e6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff30d4992 in __assert_fail () from /lib64/libc.so.6
#4  0x00005555557c0c20 in kvm_irqchip_commit_routes ()
#5  0x00005555557c0ee1 in kvm_irqchip_add_msi_route ()
#6  0x00005555557ac5e9 in vfio_msix_vector_do_use ()
#7  0x00005555557ac78e in vfio_msix_vector_use ()
#8  0x00005555556ba27a in msix_set_vector_notifiers ()
#9  0x00005555557aff91 in vfio_pci_write_config ()
#10 0x00005555557c4142 in access_with_adjusted_size ()
#11 0x00005555557c5617 in memory_region_iorange_write ()
#12 0x00005555557c2ef5 in kvm_cpu_exec ()
#13 0x000055555576e6f5 in qemu_kvm_cpu_thread_fn ()
#14 0x00007ffff6055de3 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff319c16d in clone () from /lib64/libc.so.6
Comment 18 mazhang 2013-12-25 02:11:52 EST
Verified this bug with qemu-kvm-1.5.3-28.el7.x86_64

Host:
qemu-kvm-1.5.3-28.el7.x86_64
kernel-3.10.0-61.el7.x86_64

Guest:
win2008r2-64

Steps:
Hotplug/hotunplug PF 500 times, guest works well, no crash.

So this bug has been fixed.
Comment 20 Ludek Smid 2014-06-13 06:11:56 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.