Bug 1003535

Summary: qemu-kvm core dump when boot vm with more than 32 virtio disks/nics
Product: Red Hat Enterprise Linux 7 Reporter: Xu Han <xuhan>
Component: qemu-kvmAssignee: Marcel Apfelbaum <marcel>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, hhuang, juli, juzhang, marcel, michen, mrezanin, pbonzini, rhod, sluo, virt-maint, xfu, xuhan, xwei
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-39.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 09:41:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cli - boot vm none

Description Xu Han 2013-09-02 10:01:17 UTC
Description of problem:
qemu-kvm core dump when boot vm with 88 virtio disks, both linux and windows guest hit the same issue.

Version-Release number of selected component (if applicable):
kernel: 3.10.0-11.el7.x86_64
qemu: qemu-kvm-1.5.2-3.el7.x86_64

How reproducible:
80%

Steps to Reproduce:
1.boot vm
check cli in attachment 

Actual results:
QEMU 1.5.2 monitor - type 'help' for more information
(qemu) [New Thread 0x7fffdb7bb700 (LWP 3638)]
[New Thread 0x7fffdafba700 (LWP 3639)]
[New Thread 0x7fffd91ff700 (LWP 3640)]

(qemu) [Thread 0x7fffeb316700 (LWP 3637) exited]

(qemu) qemu-kvm: /builddir/build/BUILD/qemu-1.5.2/exec.c:748: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdb7bb700 (LWP 3638)]
0x00007ffff32e4999 in raise () from /lib64/libc.so.6
(gdb) bt 
#0  0x00007ffff32e4999 in raise () from /lib64/libc.so.6
#1  0x00007ffff32e60a8 in abort () from /lib64/libc.so.6
#2  0x00007ffff32dd906 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff32dd9b2 in __assert_fail () from /lib64/libc.so.6
#4  0x000055555573b25c in register_subpage ()
#5  0x000055555573b482 in mem_add ()
#6  0x000055555578c032 in address_space_update_topology_pass.isra.5 ()
#7  0x000055555578ce8d in memory_region_transaction_commit ()
#8  0x0000555555681afc in pci_default_write_config ()
#9  0x00005555556bbfaa in virtio_write_config ()
#10 0x000055555578a8b2 in access_with_adjusted_size ()
#11 0x000055555578bd87 in memory_region_iorange_write ()
#12 0x000055555578962d in kvm_cpu_exec ()
#13 0x0000555555734545 in qemu_kvm_cpu_thread_fn ()
#14 0x00007ffff625dde3 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff33a50ad in clone () from /lib64/libc.so.6


Expected results:
vm can boot with no error

Additional info:
rhel6 host hit this bug -> Bug 753692

Comment 1 Xu Han 2013-09-02 10:07:06 UTC
Created attachment 792786 [details]
cli - boot vm

Comment 3 Amos Kong 2013-09-26 09:36:55 UTC
qemu-kvm-rhel6 & rhel7 guest : can't reproduce
qemu-upstream & rhel7 guest : can reproduce
qemu-upstream & rhel6 guest : can't reproduce

Problem occurs when adding the 33th disk, qemu crash should be fixed

Comment 4 Amos Kong 2013-09-26 15:33:49 UTC
This crash is caused by that the physical section number is unexpectedly larger than TARGET_PAGE_SIZE (4096).

The assert check was added in following commit:

commit 68f3f65b09a1ce8c82fac17911ffc3bb6031ebe4
Author: Paolo Bonzini <pbonzini>
Date:   Tue May 7 11:30:23 2013 +0200

    memory: assert that PhysPageEntry's ptr does not overflow
    
    While sized to 15 bits in PhysPageEntry, the ptr field is ORed into the
    iotlb entries together with a page-aligned pointer.  The ptr field must
    not overflow into this page-aligned value, assert that it is smaller than
    the page size.
    
    Reviewed-by: Peter Maydell <peter.maydell>
    Signed-off-by: Paolo Bonzini <pbonzini>

diff --git a/exec.c b/exec.c
index 1355661..8562fca 100644
--- a/exec.c
+++ b/exec.c
@@ -713,6 +713,12 @@ static void destroy_all_mappings(AddressSpaceDispatch *d)
 
 static uint16_t phys_section_add(MemoryRegionSection *section)
 {
+    /* The physical section number is ORed with a page-aligned
+     * pointer to produce the iotlb entries.  Thus it should
+     * never overflow into the page-aligned value.
+     */
+    assert(phys_sections_nb < TARGET_PAGE_SIZE);
+
     if (phys_sections_nb == phys_sections_nb_alloc) {
         phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
         phys_sections = g_renew(MemoryRegionSection, phys_sections,

Comment 5 Amos Kong 2013-09-27 02:01:14 UTC
Can reproduce this bug by launching guest with 33 virtio-net nics.

/home/devel/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 /images/RHEL-Server-6.4-64-virtio.qcow2  \
-monitor stdio \
-netdev tap,id=net-virtio0-0-1 -device virtio-net-pci,netdev=net-virtio0-0-8,id=virti0-0-1,multifunction=on,addr=0x04.0 \
.....
-netdev tap,id=net-virtio0-0-33 -device virtio-net-pci,netdev=net-virtio0-0-33,id=virti0-0-33,multifunction=on,addr=0x08.0 \

Comment 6 Amos Kong 2013-09-27 03:29:51 UTC
I tested with upstream kernel (v2.6.12 v 2.6.22 v2.6.32 v2.6.38 v3.0 v 3.1 .. v 3.10), this issue can be hit 100%

It's strange that we can't hit this problem with rhel6 guest (2.6.32-419.el6)

Comment 7 Amos Kong 2013-09-27 03:52:22 UTC
Paolo, any thoughts?

Comment 8 Paolo Bonzini 2013-09-27 10:06:34 UTC
Looks like there are too many BARs.

You could do something like

    if (tcg_enabled()) {
        /* The physical section number is ORed with a page-aligned
          * pointer to produce the iotlb entries.  Thus it should
          * never overflow into the page-aligned value.
          */
        assert(phys_sections_nb < TARGET_PAGE_SIZE);
    } else {
        /* For KVM or Xen we can use the full range of the ptr field
         * in PhysPageEntry.
         */
        assert(phys_sections_nb <= SHRT_MAX);
    }

Comment 9 Paolo Bonzini 2013-09-27 10:07:40 UTC
This should bring the limit up by a factor of 8 (32767 / 4096), i.e. 32*8 = 256.  Some care is still necessary when you have bridges, but it should be much better.

Comment 10 Amos Kong 2013-11-01 09:47:46 UTC
*** Bug 1025680 has been marked as a duplicate of this bug. ***

Comment 11 Amos Kong 2013-11-18 12:50:46 UTC
hi xuhan,

Can you reproduce this bug with latest qemu-kvm-rhel7?
I can't reproduce it with qemu-upstream (1.6.0, 1.5.0, 1.5.1, 1.5.2, 1.5.3).
Thanks

Comment 12 Xu Han 2013-11-19 02:17:39 UTC
hi amos,

Tested 2 times with qemu-kvm-rhev-1.5.3-19.el7.x86_64 .

Hit this issue while attached 87 virtio disks.
qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/exec.c:762: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed.
core: line 90:  5852 Aborted                 (core dumped)

But not hit while attached 51 disks.
# ls /dev/vd* | wc -l
51

best regards,
xuhan

Comment 13 Amos Kong 2013-11-19 10:52:26 UTC
(In reply to xuhan from comment #12)
> hi amos,
> 
> Tested 2 times with qemu-kvm-rhev-1.5.3-19.el7.x86_64 .

Thanks for your confirm.

I tested with RHEL 6 guest in Comment #11. I can reproduce with RHEL 7 guest (both latest qemu-upstream & qemu-kvm-rhel7)

Internal qemu crash in this point:
 register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed.

Upstream qemu has a new check in commit 68f3f65b (memory: assert that PhysPageEntry's ptr does not overflow),
it crashes in another point:
  phys_section_add: Assertion `next_map.sections_nb < (1 << 12)' failed.

Comment 14 Amos Kong 2013-11-19 11:13:22 UTC
tested with latest guest kernel (3.10.0-rc5)

1. have assert() in phys_section_add:
   assert(next_map.sections_nb < TARGET_PAGE_SIZE);

   crash occurred.

2. have another assert() in phys_section_add:
   assert(next_map.sections_nb < SHRT_MAX);

   crash occurred.

3. without this assert of next_map.sections_nb in phys_section_add

   crash occurred at register_subpage():

   assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);

Comment 15 Paolo Bonzini 2013-11-19 11:43:54 UTC
> 3. without this assert of next_map.sections_nb in phys_section_add
>    crash occurred at register_subpage():
>    assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);

What is the backtrace here?

The cause could be INT_MAX instead of UINT_MAX in hw/i386/pc_piix.c:

        memory_region_init(pci_memory, "pci", INT64_MAX);

and similarly in pc_q35.c.

Comment 16 Amos Kong 2013-11-27 15:24:44 UTC
(In reply to Paolo Bonzini from comment #15)
> > 3. without this assert of next_map.sections_nb in phys_section_add
> >    crash occurred at register_subpage():
> >    assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
> 
> What is the backtrace here?

I applied your fix ([PATCH] extend limit of physical sections number)
then qemu crash at exec.c:802: register_subpage

qemu-system-x86_64: /home/devel/qemu/exec.c:802: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffebfff700 (LWP 25608)]
0x00007ffff4391a19 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff4391a19 in raise () from /lib64/libc.so.6
#1  0x00007ffff4393128 in abort () from /lib64/libc.so.6
#2  0x00007ffff438a986 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff438aa32 in __assert_fail () from /lib64/libc.so.6
#4  0x0000555555846786 in register_subpage (d=0x7fffe4052600, section=0x7fffebffe430) at /home/devel/qemu/exec.c:802
#5  0x0000555555846aba in mem_add (listener=0x555563934c68, section=0x7fffebffe5f0) at /home/devel/qemu/exec.c:842
#6  0x00005555558b9e33 in address_space_update_topology_pass (as=0x555563934c30, old_view=0x555565928d20, new_view=0x7fffe793d000, adding=true) at /home/devel/qemu/memory.c:735
#7  0x00005555558ba418 in address_space_update_topology (as=0x555563934c30) at /home/devel/qemu/memory.c:764
#8  0x00005555558ba587 in memory_region_transaction_commit () at /home/devel/qemu/memory.c:799
#9  0x00005555558bcf84 in memory_region_set_enabled (mr=0x555568c6c158, enabled=true) at /home/devel/qemu/memory.c:1503
#10 0x000055555571b28e in pci_default_write_config (d=0x555568c6be60, addr=4, val=0, l=2) at hw/pci/pci.c:1189
#11 0x0000555555781aea in virtio_write_config (pci_dev=0x555568c6be60, address=4, val=7, len=2) at hw/virtio/virtio-pci.c:459
#12 0x0000555555720094 in pci_host_config_write_common (pci_dev=0x555568c6be60, addr=4, limit=256, val=7, len=2) at hw/pci/pci_host.c:57
#13 0x00005555557201e4 in pci_data_write (s=0x555556418c10, addr=2147513092, val=7, len=2) at hw/pci/pci_host.c:84
#14 0x00005555557203a0 in pci_host_data_write (opaque=0x555556416640, addr=0, val=7, len=2) at hw/pci/pci_host.c:137
#15 0x00005555558b86af in memory_region_write_accessor (mr=0x555556418a30, addr=0, value=0x7fffebffeaa8, size=2, shift=0, mask=65535) at /home/devel/qemu/memory.c:440
#16 0x00005555558b87ec in access_with_adjusted_size (addr=0, value=0x7fffebffeaa8, size=2, access_size_min=1, access_size_max=4, access=
    0x5555558b861f <memory_region_write_accessor>, mr=0x555556418a30) at /home/devel/qemu/memory.c:477
#17 0x00005555558badb9 in memory_region_dispatch_write (mr=0x555556418a30, addr=0, data=7, size=2) at /home/devel/qemu/memory.c:984
#18 0x00005555558be040 in io_mem_write (mr=0x555556418a30, addr=0, val=7, size=2) at /home/devel/qemu/memory.c:1748
#19 0x000055555584949c in address_space_rw (as=0x5555561ef780 <address_space_io>, addr=3324, buf=0x7ffff7ff2000 "\a", len=2, is_write=true) at /home/devel/qemu/exec.c:1904
#20 0x00005555558b5075 in kvm_handle_io (port=3324, data=0x7ffff7ff2000, direction=1, size=2, count=1) at /home/devel/qemu/kvm-all.c:1542
#21 0x00005555558b5632 in kvm_cpu_exec (cpu=0x5555563fc3e0) at /home/devel/qemu/kvm-all.c:1680
#22 0x000055555583c3c0 in qemu_kvm_cpu_thread_fn (arg=0x5555563fc3e0) at /home/devel/qemu/cpus.c:872
#23 0x00007ffff625dc53 in start_thread () from /lib64/libpthread.so.0
#24 0x00007ffff4451e1d in clone () from /lib64/libc.so.6

> 
> The cause could be INT_MAX instead of UINT_MAX in hw/i386/pc_piix.c:
> 
>         memory_region_init(pci_memory, "pci", INT64_MAX);


After this change, guest can add more than about 20 devices, but it still crash at same point.

> 
> and similarly in pc_q35.c.

Comment 17 Amos Kong 2013-11-27 23:59:55 UTC
It's a TCG memory related issue, reassign to Marcel as talked in IRC
Thanks.

Comment 20 Miroslav Rezanina 2014-01-21 09:16:48 UTC
Fix included in qemu-kvm-1.5.3-39.el7

Comment 23 Jun Li 2014-02-10 07:57:04 UTC
Reproduce this bug:
Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-37.el7.x86_64
3.10.0-79.el7.x86_64
---
Boot guest using the following script:
# cat bug1003535-mutifunction-on.sh 
#! /bin/sh
CLI="gdb --args /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -monitor stdio -enable-kvm -m 5G -smp 2,sockets=1,cores=2,threads=1 -name RHEL-Server-7.0-64 -boot c \
-drive file=/home/juli/rhel7.0.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -net none -spice disable-ticketing,port=5931 -vga qxl -serial unix:/tmp/virtio,server,nowait" 

for ((m=3;m<=13;m++)); do
    for ((i=0;i<=7;i++)); do
        k=`printf "%02x" $m`
        echo $k
        num=$(($i+($m-3)*8))
        echo $num
        CLI="$CLI -drive file=/home/disk/disk$num,if=none,id=drive-virtio0-0-$num,format=qcow2"
        CLI="$CLI -device virtio-blk-pci,drive=drive-virtio0-0-$num,id=virti0-0-$num,multifunction=on,addr=0x$k.$i"
    done
done

$CLI
------
(gdb) bt
#0  0x00007ffff2c9c979 in raise () from /lib64/libc.so.6
#1  0x00007ffff2c9e088 in abort () from /lib64/libc.so.6
#2  0x00007ffff2c958e6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff2c95992 in __assert_fail () from /lib64/libc.so.6
#4  0x0000555555781aac in register_subpage ()
#5  0x0000555555781cd2 in mem_add ()
#6  0x00005555557d4cf2 in address_space_update_topology_pass.isra.5 ()
#7  0x00005555557d5b4d in memory_region_transaction_commit ()
#8  0x00005555556c1abc in pci_default_write_config ()
#9  0x00005555556f7afa in virtio_write_config ()
#10 0x00005555557d3572 in access_with_adjusted_size ()
#11 0x00005555557d4a47 in memory_region_iorange_write ()
#12 0x00005555557d2355 in kvm_cpu_exec ()
#13 0x000055555577a8c5 in qemu_kvm_cpu_thread_fn ()
#14 0x00007ffff604fde3 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ffff2d5d25d in clone () from /lib64/libc.so.6
-------
based on above test, this issue has been reproduced.
==================
Verified this bug:
Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-45.el7.x86_64
------
Steps as followings:

1,Boot guest using the following script:
# cat bug1003535-mutifunction-on.sh 
#! /bin/sh
CLI="gdb --args /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -monitor stdio -enable-kvm -m 5G -smp 2,sockets=1,cores=2,threads=1 -name RHEL-Server-7.0-64 -boot c \
-drive file=/home/juli/rhel7.0.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -net none -spice disable-ticketing,port=5931 -vga qxl -serial unix:/tmp/virtio,server,nowait" 

for ((m=3;m<=13;m++)); do
    for ((i=0;i<=7;i++)); do
        k=`printf "%02x" $m`
        echo $k
        num=$(($i+($m-3)*8))
        echo $num
        CLI="$CLI -drive file=/home/disk/disk$num,if=none,id=drive-virtio0-0-$num,format=qcow2"
        CLI="$CLI -device virtio-blk-pci,drive=drive-virtio0-0-$num,id=virti0-0-$num,multifunction=on,addr=0x$k.$i"
    done
done

$CLI
---------
2, check these 88 disks inside guest.
# ls /dev/vd* |wc -l
88
-----
Based on above test, this issue has been verified.

Comment 25 Ludek Smid 2014-06-13 09:41:44 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.