Bug 1003535
Summary: | qemu-kvm core dump when boot vm with more than 32 virtio disks/nics | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Xu Han <xuhan> | ||||
Component: | qemu-kvm | Assignee: | Marcel Apfelbaum <marcel> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | acathrow, hhuang, juli, juzhang, marcel, michen, mrezanin, pbonzini, rhod, sluo, virt-maint, xfu, xuhan, xwei | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-1.5.3-39.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-06-13 09:41:44 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Xu Han
2013-09-02 10:01:17 UTC
Created attachment 792786 [details]
cli - boot vm
qemu-kvm-rhel6 & rhel7 guest : can't reproduce qemu-upstream & rhel7 guest : can reproduce qemu-upstream & rhel6 guest : can't reproduce Problem occurs when adding the 33th disk, qemu crash should be fixed This crash is caused by that the physical section number is unexpectedly larger than TARGET_PAGE_SIZE (4096). The assert check was added in following commit: commit 68f3f65b09a1ce8c82fac17911ffc3bb6031ebe4 Author: Paolo Bonzini <pbonzini> Date: Tue May 7 11:30:23 2013 +0200 memory: assert that PhysPageEntry's ptr does not overflow While sized to 15 bits in PhysPageEntry, the ptr field is ORed into the iotlb entries together with a page-aligned pointer. The ptr field must not overflow into this page-aligned value, assert that it is smaller than the page size. Reviewed-by: Peter Maydell <peter.maydell> Signed-off-by: Paolo Bonzini <pbonzini> diff --git a/exec.c b/exec.c index 1355661..8562fca 100644 --- a/exec.c +++ b/exec.c @@ -713,6 +713,12 @@ static void destroy_all_mappings(AddressSpaceDispatch *d) static uint16_t phys_section_add(MemoryRegionSection *section) { + /* The physical section number is ORed with a page-aligned + * pointer to produce the iotlb entries. Thus it should + * never overflow into the page-aligned value. + */ + assert(phys_sections_nb < TARGET_PAGE_SIZE); + if (phys_sections_nb == phys_sections_nb_alloc) { phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16); phys_sections = g_renew(MemoryRegionSection, phys_sections, Can reproduce this bug by launching guest with 33 virtio-net nics. /home/devel/qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 /images/RHEL-Server-6.4-64-virtio.qcow2 \ -monitor stdio \ -netdev tap,id=net-virtio0-0-1 -device virtio-net-pci,netdev=net-virtio0-0-8,id=virti0-0-1,multifunction=on,addr=0x04.0 \ ..... -netdev tap,id=net-virtio0-0-33 -device virtio-net-pci,netdev=net-virtio0-0-33,id=virti0-0-33,multifunction=on,addr=0x08.0 \ I tested with upstream kernel (v2.6.12 v 2.6.22 v2.6.32 v2.6.38 v3.0 v 3.1 .. v 3.10), this issue can be hit 100% It's strange that we can't hit this problem with rhel6 guest (2.6.32-419.el6) Paolo, any thoughts? Looks like there are too many BARs. You could do something like if (tcg_enabled()) { /* The physical section number is ORed with a page-aligned * pointer to produce the iotlb entries. Thus it should * never overflow into the page-aligned value. */ assert(phys_sections_nb < TARGET_PAGE_SIZE); } else { /* For KVM or Xen we can use the full range of the ptr field * in PhysPageEntry. */ assert(phys_sections_nb <= SHRT_MAX); } This should bring the limit up by a factor of 8 (32767 / 4096), i.e. 32*8 = 256. Some care is still necessary when you have bridges, but it should be much better. *** Bug 1025680 has been marked as a duplicate of this bug. *** hi xuhan, Can you reproduce this bug with latest qemu-kvm-rhel7? I can't reproduce it with qemu-upstream (1.6.0, 1.5.0, 1.5.1, 1.5.2, 1.5.3). Thanks hi amos, Tested 2 times with qemu-kvm-rhev-1.5.3-19.el7.x86_64 . Hit this issue while attached 87 virtio disks. qemu-kvm: /builddir/build/BUILD/qemu-1.5.3/exec.c:762: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed. core: line 90: 5852 Aborted (core dumped) But not hit while attached 51 disks. # ls /dev/vd* | wc -l 51 best regards, xuhan (In reply to xuhan from comment #12) > hi amos, > > Tested 2 times with qemu-kvm-rhev-1.5.3-19.el7.x86_64 . Thanks for your confirm. I tested with RHEL 6 guest in Comment #11. I can reproduce with RHEL 7 guest (both latest qemu-upstream & qemu-kvm-rhel7) Internal qemu crash in this point: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed. Upstream qemu has a new check in commit 68f3f65b (memory: assert that PhysPageEntry's ptr does not overflow), it crashes in another point: phys_section_add: Assertion `next_map.sections_nb < (1 << 12)' failed. tested with latest guest kernel (3.10.0-rc5) 1. have assert() in phys_section_add: assert(next_map.sections_nb < TARGET_PAGE_SIZE); crash occurred. 2. have another assert() in phys_section_add: assert(next_map.sections_nb < SHRT_MAX); crash occurred. 3. without this assert of next_map.sections_nb in phys_section_add crash occurred at register_subpage(): assert(existing->mr->subpage || existing->mr == &io_mem_unassigned); > 3. without this assert of next_map.sections_nb in phys_section_add
> crash occurred at register_subpage():
> assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
What is the backtrace here?
The cause could be INT_MAX instead of UINT_MAX in hw/i386/pc_piix.c:
memory_region_init(pci_memory, "pci", INT64_MAX);
and similarly in pc_q35.c.
(In reply to Paolo Bonzini from comment #15) > > 3. without this assert of next_map.sections_nb in phys_section_add > > crash occurred at register_subpage(): > > assert(existing->mr->subpage || existing->mr == &io_mem_unassigned); > > What is the backtrace here? I applied your fix ([PATCH] extend limit of physical sections number) then qemu crash at exec.c:802: register_subpage qemu-system-x86_64: /home/devel/qemu/exec.c:802: register_subpage: Assertion `existing->mr->subpage || existing->mr == &io_mem_unassigned' failed. Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffebfff700 (LWP 25608)] 0x00007ffff4391a19 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff4391a19 in raise () from /lib64/libc.so.6 #1 0x00007ffff4393128 in abort () from /lib64/libc.so.6 #2 0x00007ffff438a986 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff438aa32 in __assert_fail () from /lib64/libc.so.6 #4 0x0000555555846786 in register_subpage (d=0x7fffe4052600, section=0x7fffebffe430) at /home/devel/qemu/exec.c:802 #5 0x0000555555846aba in mem_add (listener=0x555563934c68, section=0x7fffebffe5f0) at /home/devel/qemu/exec.c:842 #6 0x00005555558b9e33 in address_space_update_topology_pass (as=0x555563934c30, old_view=0x555565928d20, new_view=0x7fffe793d000, adding=true) at /home/devel/qemu/memory.c:735 #7 0x00005555558ba418 in address_space_update_topology (as=0x555563934c30) at /home/devel/qemu/memory.c:764 #8 0x00005555558ba587 in memory_region_transaction_commit () at /home/devel/qemu/memory.c:799 #9 0x00005555558bcf84 in memory_region_set_enabled (mr=0x555568c6c158, enabled=true) at /home/devel/qemu/memory.c:1503 #10 0x000055555571b28e in pci_default_write_config (d=0x555568c6be60, addr=4, val=0, l=2) at hw/pci/pci.c:1189 #11 0x0000555555781aea in virtio_write_config (pci_dev=0x555568c6be60, address=4, val=7, len=2) at hw/virtio/virtio-pci.c:459 #12 0x0000555555720094 in pci_host_config_write_common (pci_dev=0x555568c6be60, addr=4, limit=256, val=7, len=2) at hw/pci/pci_host.c:57 #13 0x00005555557201e4 in pci_data_write (s=0x555556418c10, addr=2147513092, val=7, len=2) at hw/pci/pci_host.c:84 #14 0x00005555557203a0 in pci_host_data_write (opaque=0x555556416640, addr=0, val=7, len=2) at hw/pci/pci_host.c:137 #15 0x00005555558b86af in memory_region_write_accessor (mr=0x555556418a30, addr=0, value=0x7fffebffeaa8, size=2, shift=0, mask=65535) at /home/devel/qemu/memory.c:440 #16 0x00005555558b87ec in access_with_adjusted_size (addr=0, value=0x7fffebffeaa8, size=2, access_size_min=1, access_size_max=4, access= 0x5555558b861f <memory_region_write_accessor>, mr=0x555556418a30) at /home/devel/qemu/memory.c:477 #17 0x00005555558badb9 in memory_region_dispatch_write (mr=0x555556418a30, addr=0, data=7, size=2) at /home/devel/qemu/memory.c:984 #18 0x00005555558be040 in io_mem_write (mr=0x555556418a30, addr=0, val=7, size=2) at /home/devel/qemu/memory.c:1748 #19 0x000055555584949c in address_space_rw (as=0x5555561ef780 <address_space_io>, addr=3324, buf=0x7ffff7ff2000 "\a", len=2, is_write=true) at /home/devel/qemu/exec.c:1904 #20 0x00005555558b5075 in kvm_handle_io (port=3324, data=0x7ffff7ff2000, direction=1, size=2, count=1) at /home/devel/qemu/kvm-all.c:1542 #21 0x00005555558b5632 in kvm_cpu_exec (cpu=0x5555563fc3e0) at /home/devel/qemu/kvm-all.c:1680 #22 0x000055555583c3c0 in qemu_kvm_cpu_thread_fn (arg=0x5555563fc3e0) at /home/devel/qemu/cpus.c:872 #23 0x00007ffff625dc53 in start_thread () from /lib64/libpthread.so.0 #24 0x00007ffff4451e1d in clone () from /lib64/libc.so.6 > > The cause could be INT_MAX instead of UINT_MAX in hw/i386/pc_piix.c: > > memory_region_init(pci_memory, "pci", INT64_MAX); After this change, guest can add more than about 20 devices, but it still crash at same point. > > and similarly in pc_q35.c. It's a TCG memory related issue, reassign to Marcel as talked in IRC Thanks. Fix included in qemu-kvm-1.5.3-39.el7 Reproduce this bug: Version-Release number of selected component (if applicable): qemu-kvm-1.5.3-37.el7.x86_64 3.10.0-79.el7.x86_64 --- Boot guest using the following script: # cat bug1003535-mutifunction-on.sh #! /bin/sh CLI="gdb --args /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -monitor stdio -enable-kvm -m 5G -smp 2,sockets=1,cores=2,threads=1 -name RHEL-Server-7.0-64 -boot c \ -drive file=/home/juli/rhel7.0.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -net none -spice disable-ticketing,port=5931 -vga qxl -serial unix:/tmp/virtio,server,nowait" for ((m=3;m<=13;m++)); do for ((i=0;i<=7;i++)); do k=`printf "%02x" $m` echo $k num=$(($i+($m-3)*8)) echo $num CLI="$CLI -drive file=/home/disk/disk$num,if=none,id=drive-virtio0-0-$num,format=qcow2" CLI="$CLI -device virtio-blk-pci,drive=drive-virtio0-0-$num,id=virti0-0-$num,multifunction=on,addr=0x$k.$i" done done $CLI ------ (gdb) bt #0 0x00007ffff2c9c979 in raise () from /lib64/libc.so.6 #1 0x00007ffff2c9e088 in abort () from /lib64/libc.so.6 #2 0x00007ffff2c958e6 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff2c95992 in __assert_fail () from /lib64/libc.so.6 #4 0x0000555555781aac in register_subpage () #5 0x0000555555781cd2 in mem_add () #6 0x00005555557d4cf2 in address_space_update_topology_pass.isra.5 () #7 0x00005555557d5b4d in memory_region_transaction_commit () #8 0x00005555556c1abc in pci_default_write_config () #9 0x00005555556f7afa in virtio_write_config () #10 0x00005555557d3572 in access_with_adjusted_size () #11 0x00005555557d4a47 in memory_region_iorange_write () #12 0x00005555557d2355 in kvm_cpu_exec () #13 0x000055555577a8c5 in qemu_kvm_cpu_thread_fn () #14 0x00007ffff604fde3 in start_thread () from /lib64/libpthread.so.0 #15 0x00007ffff2d5d25d in clone () from /lib64/libc.so.6 ------- based on above test, this issue has been reproduced. ================== Verified this bug: Version-Release number of selected component (if applicable): qemu-kvm-1.5.3-45.el7.x86_64 ------ Steps as followings: 1,Boot guest using the following script: # cat bug1003535-mutifunction-on.sh #! /bin/sh CLI="gdb --args /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -monitor stdio -enable-kvm -m 5G -smp 2,sockets=1,cores=2,threads=1 -name RHEL-Server-7.0-64 -boot c \ -drive file=/home/juli/rhel7.0.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -net none -spice disable-ticketing,port=5931 -vga qxl -serial unix:/tmp/virtio,server,nowait" for ((m=3;m<=13;m++)); do for ((i=0;i<=7;i++)); do k=`printf "%02x" $m` echo $k num=$(($i+($m-3)*8)) echo $num CLI="$CLI -drive file=/home/disk/disk$num,if=none,id=drive-virtio0-0-$num,format=qcow2" CLI="$CLI -device virtio-blk-pci,drive=drive-virtio0-0-$num,id=virti0-0-$num,multifunction=on,addr=0x$k.$i" done done $CLI --------- 2, check these 88 disks inside guest. # ls /dev/vd* |wc -l 88 ----- Based on above test, this issue has been verified. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |