Bug 1481593
Summary: | Boot guest failed with "src/central_freelist.cc:333] tcmalloc: allocation failed 196608" when 465 disks are attached to 465 pci-bridges | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm-rhev | Assignee: | David Gibson <dgibson> |
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.4-Alt | CC: | bugproxy, dgibson, hannsj_uhl, jinzhao, knoel, lmiksik, michen, qzhang, rbalakri, virt-maint, xianwang |
Target Milestone: | rc | ||
Target Release: | 7.5 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.10.0-7.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 00:33:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1399177, 1476742 |
Description
xianwang
2017-08-15 07:05:04 UTC
1.I have changed the guest mem to 8G and smp to 8, the test result is same with bug description. 2.check the disk information that attached the each pci-bridge: [root@c155f1-u15 xianwang]# qemu-img info ./pci-bridge/d1 image: ./pci-bridge/d1 file format: qcow2 virtual size: 1.0K (1024 bytes) disk size: 196K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false I have re-test this scenario on power8 with rhel74 version, but not hit this issue. version: 3.10.0-693.1.1.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7.ppc64le SLOF-20170303-4.git66d250e.el7.noarch steps are same as bug description. result: Boot guest successfully. QEMU 2.9.0 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 (qemu) info status VM status: running both "info pci" and "info block" work well. not tcmalloc error prompt, but it failed that try to login guest. # nc -U /tmp/console1 -------can not login, with some information prompt in HMP (qemu) virtio-blk failed to set guest notifier (-24), ensure -enable-kvm is set qemu-kvm: virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). The result is not same with bug report,at least, there is no core file generated. From the gdb errors, I suspect this indicates we have a heap corruption bug. Nasty. I think the next step is to bisect between the 7.4 and pegas versions to see where the problem was introduced. XianXian, can you attach your test script for this, so I can reproduce the problem more easily? (In reply to David Gibson from comment #5) > From the gdb errors, I suspect this indicates we have a heap corruption bug. > Nasty. > > I think the next step is to bisect between the 7.4 and pegas versions to see > where the problem was introduced. > > XianXian, can you attach your test script for this, so I can reproduce the > problem more easily? Hi, Dave, I have already provided my test script named "max_pci-br.sh" in comment 0, just as step 2,as following: 2.Boot a guest with 465 pci-bridge devices and there is one disk about 1K attached to each pci-bridge, as the following script: [root@c155f1-u15 xianwang]# cat max_pci-br.sh #!/bin/sh MACHINE=pseries SMP=4 MEM=4G GUEST_IMG=/home/xianwang/RHEL-ALT-7.4-20170726.0-ppc64le-virtio-scsi.qcow2 IMG_FORMAT=qcow2 CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -nodefaults -smp $SMP -m $MEM -name vm1 -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -device virtio-scsi-pci,id=controller_scsi,bus=pci.0,addr=03 -drive file=$GUEST_IMG,if=none,id=drive-virtio-disk0,format=$IMG_FORMAT,cache=none,werror=stop,rerror=stop -device scsi-hd,id=scsi1,drive=drive-virtio-disk0,bus=controller_scsi.0,bootindex=1 -chardev socket,id=console1,path=/tmp/console1,server,nowait -device spapr-vty,chardev=console1" while [ ${i:=1} -lt ${1:-0} ] do CLI="$CLI -device pci-bridge,chassis_nr=$i,id=bridge$i,bus=pci.0" for ((j=1;j<=31;j++)); do z=$((31*$i-31+$j)) echo $i,$j,$z qemu-img create -f qcow2 /home/xianwang/pci-bridge/d$z 1k CLI="$CLI -drive file=/home/xianwang/pci-bridge/d$z,if=none,id=disk$z" CLI="$CLI -device virtio-blk-pci,bus=bridge$i,drive=disk$z,id=blk$z,addr=0x$(printf "%02x" $j)" done ((i++)) done $CLI [root@c155f1-u15 xianwang]# sh max_pci-br.sh 16 1,1,1 Formatting '/home/xianwang/pci-bridge/d1', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 ......... 15,31,465 Formatting '/home/xianwang/pci-bridge/d465', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 QEMU 2.9.0 monitor - type 'help' for more information (In reply to xianwang from comment #0) Sorry, I just realize I have mistakes in bug description, I update it. > Description of problem: > Boot a guest with 465 pci-bridge devices in qemu cli, and there is one disk > about 1K attached to each pci-bridge, boot guest failed, and this qemu > process takes about 25G memory of host, i.e, 80% memory of host. > Boot a guest with 15 pci-bridges and there is 31 disks attached to each one pci-bridge, so, the total count of disk is 465, each disk is 1K. Boot guest failed, and this qemu process takes about 25G memory of host, i.e, 80% memory of host. > Version-Release number of selected component (if applicable): > 4.11.0-23.el7a.ppc64le > qemu-kvm-2.9.0-20.el7a.ppc64le > SLOF-20170303-4.git66d250e.el7.noarch > > How reproducible: > 3/3 > > Steps to Reproduce: > 1.Before boot guest, check the memory of host: > [root@c155f1-u15 ~]# free -h > total used free shared buff/cache > available > Mem: 30G 296M 30G 11M 527M > 29G > Swap: 15G 204M 15G > > 2.Boot a guest with 465 pci-bridge devices and there is one disk about 1K > attached to each pci-bridge, as the following script: 2. Boot a guest with 15 pci-bridge, and each pci-bridge is attached by 31 disk, each disk is 1K, as the following script: > [root@c155f1-u15 xianwang]# cat max_pci-br.sh > #!/bin/sh > MACHINE=pseries > SMP=4 > MEM=4G > GUEST_IMG=/home/xianwang/RHEL-ALT-7.4-20170726.0-ppc64le-virtio-scsi.qcow2 > IMG_FORMAT=qcow2 > > CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -nodefaults -smp $SMP -m > $MEM -name vm1 > -monitor stdio -qmp tcp:0:6666,server,nowait -boot > menu=on,reboot-timeout=8,strict=on > -device virtio-scsi-pci,id=controller_scsi,bus=pci.0,addr=03 > -drive > file=$GUEST_IMG,if=none,id=drive-virtio-disk0,format=$IMG_FORMAT,cache=none, > werror=stop,rerror=stop > -device > scsi-hd,id=scsi1,drive=drive-virtio-disk0,bus=controller_scsi.0,bootindex=1 > -chardev socket,id=console1,path=/tmp/console1,server,nowait > -device spapr-vty,chardev=console1" > > while [ ${i:=1} -lt ${1:-0} ] > do > CLI="$CLI -device pci-bridge,chassis_nr=$i,id=bridge$i,bus=pci.0" > for ((j=1;j<=31;j++)); > do > z=$((31*$i-31+$j)) > echo $i,$j,$z > qemu-img create -f qcow2 /home/xianwang/pci-bridge/d$z 1k > CLI="$CLI -drive file=/home/xianwang/pci-bridge/d$z,if=none,id=disk$z" > CLI="$CLI -device > virtio-blk-pci,bus=bridge$i,drive=disk$z,id=blk$z,addr=0x$(printf "%02x" $j)" > done > ((i++)) > done > > $CLI > [root@c155f1-u15 xianwang]# sh max_pci-br.sh 16 > 1,1,1 > Formatting '/home/xianwang/pci-bridge/d1', fmt=qcow2 size=1024 > encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 > ......... > 15,31,465 > Formatting '/home/xianwang/pci-bridge/d465', fmt=qcow2 size=1024 > encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 > QEMU 2.9.0 monitor - type 'help' for more information > > 2.During booting, check the memory of host > [root@c155f1-u15 ~]# free -h > total used free shared buff/cache > available > Mem: 30G 25G 590M 7.9M 5.2G > 4.4G > Swap: 15G 305M 15G > [root@c155f1-u15 ~]# ps -aux | grep qemu > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 10560 10.5 80.2 31992704 25971584 pts/9 Sl+ 06:17 0:30 > /usr/libexec/qemu-kvm > [root@c155f1-u15 ~]# top > KiB Mem : 32370176 total, 1036480 free, 26331264 used, 5002432 buff/cache > KiB Swap: 16711616 total, 16398336 free, 313280 used. 4645952 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 10609 root 20 0 28288 15360 8384 D 20.5 0.0 0:15.43 > abrt-hook-ccpp > 10560 root 20 0 30.511g 0.024t 16896 S 4.0 80.3 0:21.78 qemu-kvm > > 3. > > Actual results: > Boot guest failed, HMP hang, and there is core file generated. > QEMU 2.9.0 monitor - type 'help' for more information > (qemu) src/central_freelist.cc:333] tcmalloc: allocation failed 196608 > > (process:10560): GLib-ERROR **: gmem.c:165: failed to allocate 49152 bytes > max_pci-br.sh: line 30: 10560 Trace/breakpoint trap (core dumped) $CLI > > after qemu quit, check the memory again, > [root@c155f1-u15 ~]# free -h > total used free shared buff/cache > available > Mem: 30G 303M 27G 9.1M 2.7G > 29G > Swap: 15G 218M 15G > > > Expected results: > Boot guest successfully with 465 or 930 pci-bridge and there is one disk > attached to each pci-bridge, there is no core files generated and no error. > Boot guest successfully with 15 or 30 pci-bridge and there is 31 disks attached to each one pci-bridge, there is no core files generated and no error > Additional info: > # gdb -c core.10560 > (gdb) t a a bt fu > > Thread 6 (LWP 10569): > Cannot access memory at address 0x3fff8629dfd0 > #0 0x00003fff8d55e6ec in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Thread 5 (LWP 10570): > Cannot access memory at address 0x3fff85a7dfa0 > #0 0x00003fff8d562bd4 in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Thread 4 (LWP 10561): > Cannot access memory at address 0x3fff8bfce080 > #0 0x00003fff8d562c74 in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Thread 3 (LWP 10568): > Cannot access memory at address 0x3fff86abdfd0 > #0 0x00003fff8d55e6ec in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Thread 2 (LWP 10571): > Cannot access memory at address 0x3fff8525dfa0 > #0 0x00003fff8d562c74 in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Thread 1 (LWP 10560): > Cannot access memory at address 0x3fffdb53c510 > #0 0x00003fff8d565408 in ?? () > No symbol table info available. > Backtrace stopped: previous frame identical to this frame (corrupt stack?) We seem to have a problem with this using grossly excessive memory. It's upstream as well. However, hundreds of tiny virtio-blk devices is hardly an expected use case, so deferring. ------- Comment From alexey.com 2017-08-24 03:30 EDT------- I tried QEMU with 2GB of RAM, -initrd+-kernel, pseries, 64 PCI bridges, -S, no KVM, some virtio-block devices; I run it under "valgrind --tool=exp-dhat". The summary of each run is: 50 virtio-block devices: guest_insns: 2,728,740,444 max_live: 1,214,121,770 in 226,958 blocks tot_alloc: 1,384,726,690 in 310,930 blocks 150 virtio-block devices: guest_insns: 17,576,279,582 max_live: 7,454,182,031 in 1,286,128 blocks tot_alloc: 7,958,747,994 in 1,469,719 blocks 250 virtio-block devices: guest_insns: 46,100,928,249 max_live: 19,423,868,479 in 3,264,833 blocks tot_alloc: 20,262,409,839 in 3,548,220 blocks 350 virtio-block devices: guest_insns: 88,046,403,555 max_live: 36,994,652,991 in 6,140,203 blocks tot_alloc: 38,167,153,779 in 6,523,206 blocks With the hack (see below) and 350 virtio-block devices, the summary is: guest_insns: 7,873,805,573 max_live: 2,577,738,019 in 2,567,682 blocks tot_alloc: 3,750,238,807 in 2,950,685 blocks insns per allocated byte: 2 The hack: diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 5d14bd6..7f1041e 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1790,7 +1790,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) 0, memory_region_size(&proxy->modern_bar)); - address_space_init(&proxy->modern_as, &proxy->modern_cfg, "virtio-pci-cfg-as"); + //address_space_init(&proxy->modern_as, &proxy->modern_cfg, "virtio-pci-cfg-as"); afaict something is wrong with either address spaces in QEMU or libc does not handle lots of realloc well. I keep debugging this. ------- Comment From alexey.com 2017-08-28 22:58 EDT------- The problem is identified - RCU calls AS dispatch disposal helper too late so hence is the memory usage. Also address space dispatch trees get rebuild way too often. I am working on a solution now. Move to qemu-kvm-rhev. This fix will apply to both RHEL KVM and qemu-kvm-rhev for RHV and RHOSP. Both packages are using the same 2.10 code base. Alexey's fixes for this are now merged upstream. Some preliminaries are in 2.10. The rest is in as: a93c8d8 virtio-pci: Replace modern_as with direct access to modern_bar and 092aa2fc65b7a35121616aad8f39d47b8f921618 memory: Share special empty FlatView e673ba9af9bf8fd8e0f44025ac738b8285b3ed27 memory: seek FlatView sharing candidates among children subregions 02d9651d6a46479e9d70b72dca34e43605d06cda memory: trace FlatView creation and destruction 202fc01b05572ecb258fdf4c5bd56cf6de8140c7 memory: Create FlatView directly b516572f31c0ea0937cd9d11d9bd72dd83809886 memory: Get rid of address_space_init_shareable 5e8fd947e2670c3c18f139de6a83fafcb56abbcc memory: Rework "info mtree" to print flat views and dispatch trees 67ace39b253ed5ae465275bc870f7e495547658b memory: Do not allocate FlatView in address_space_init 967dc9b1194a9281124b2e1ce67b6c3359a2138f memory: Share FlatView's and dispatch trees between address spaces 02218487649558ed66c3689d4cc55250a42601d8 memory: Move address_space_update_ioeventfds 9bf561e36cf8fed9565011a19ba9ea0100e1811e memory: Alloc dispatch tree where topology is generared 89c177bbdd6cf8e50b3fd4831697d50e195d6432 memory: Store physical root MR in FlatView 8629d3fcb77e9775e44d9051bad0fb5187925eae memory: Rename mem_begin/mem_commit/mem_add helpers 9950322a593ff900a860fb52938159461798a831 memory: Cleanup after switching to FlatView 166206845f7fd75e720e6feea0bb01957c8da07f memory: Switch memory from using AddressSpace to FlatView c7752523787dc148f5ee976162e80ab594c386a1 memory: Remove AddressSpace pointer from AddressSpaceDispatch 66a6df1dc6d5b28cc3e65db0d71683fbdddc6b62 memory: Move AddressSpaceDispatch from AddressSpace to FlatView cc94cd6d36602d976a5e7bc29134d1eaefb4102e memory: Move FlatView allocation to a helper 9a62e24f45bc97f8eaf198caf58906b47c50a8d5 memory: Open code FlatView rendering e76bb18f7e430e0c50fb38d051feacf268bd78f4 exec: Explicitly export target AS from address_space_translate_internal 447b0d0b9ee8a0ac216c3186e0f3c427a1001f0c memory: avoid "resurrection" of dead FlatViews I've made a preliminary backport, brewing at: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14556958 That build hit a compile problem on arm. I've addressed this by pulling in another prereq patch. Trying again at: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14565465 Tried the test package out with a case similar to comment 0. Looking much better - exact limits depend on host RAM size, but after the test package was no longer consuming vast amounts of memory during startup. There are still some issues with that many devices causing systemd to timeout during guest boot, but that's a different problem. Fix included in qemu-kvm-rhev-2.10.0-7.el7 This bug is verified pass on qemu-kvm-rhev-2.10.0-9.el7.ppc64le. Host: 4.14.0-9.el7a.ppc64le qemu-kvm-rhev-2.10.0-9.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch Guest: 4.14.0-9.el7a.ppc64le steps are same as bug report result: guest boot successfully, and works well. there is message prompt: qemu-kvm: virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). virtio-blk failed to set guest notifier (-24), ensure -enable-kvm is set as comment4 said, this is another bug: https://bugzilla.redhat.com/show_bug.cgi?id=1436534 So, this bug is fixed. *** Bug 1359614 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |