Bug 1420456 - [ppc64le]reset vm when do migration, HMP in src host promp "tcmalloc: large alloc 1073872896 bytes..."
Summary: [ppc64le]reset vm when do migration, HMP in src host promp "tcmalloc: large a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: ppc64le
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1404673
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-08 17:20 UTC by Jaroslav Reznik
Modified: 2017-03-01 08:02 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-28.el7_3.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1404673
Environment:
Last Closed: 2017-03-01 08:02:23 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0350 0 normal SHIPPED_LIVE Important: qemu-kvm-rhev security and bug fix update 2017-03-01 12:59:14 UTC

Description Jaroslav Reznik 2017-02-08 17:20:33 UTC
This bug has been copied from bug #1404673 and has been proposed
to be backported to 7.3 z-stream (EUS).

Comment 3 xianwang 2017-02-09 07:03:11 UTC
Hi, Laurent,
This bug can't be reproduced in following version(rhel-7.3.z+):

Host(both src host and dst host):
distro:RHEL-7.3 Server ppc64le
3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
RHEL7.3 LE
3.10.0-558.el7.ppc64le

test steps:
(1)boot a guest in src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
    -device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0  \
    -device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 8G \
    -smp 8 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm
(2)boot a guest with same qemu cli as src host and appending 
"-incoming tcp:0:5801"
(3).do migration and then reset vm with following command:
(qemu) migrate -d tcp:$dst:$port
(qemu) system_reset

Actual result:
migration completed and vm work well, there's no "tcmalloc..." info prompt.

I have tried to test this scenario 5 times, but can't reproduced it.
So, does this bug is fixed?

Comment 4 Laurent Vivier 2017-02-09 07:58:46 UTC
There is no real way to verify this bug is fixed for rhel-7.3.z: as the size of the unnecessary memory allocation is only 32MB it doesn't trigger the tcmalloc() warning.
I've tested this having added some traces in the function, and I've seen the memory size allocated for the log has been reduced from 32MB to 64kB with this patch.

Comment 5 xianwang 2017-02-09 08:27:30 UTC
(In reply to Laurent Vivier from comment #4)
> There is no real way to verify this bug is fixed for rhel-7.3.z: as the size
> of the unnecessary memory allocation is only 32MB it doesn't trigger the
> tcmalloc() warning.
> I've tested this having added some traces in the function, and I've seen the
> memory size allocated for the log has been reduced from 32MB to 64kB with
> this patch.

Since this bug can't be reproduced for rhel-7.3.z, So, when we verify it in future, do we test the scenario same as comment 3 ?

Comment 6 Laurent Vivier 2017-02-09 09:48:22 UTC
You can use systemtap to log memory allocated by qemu.

As we know the oversized memory size is > 32MB, we can use this script to check:

$ cat qemu-watch.stp
probe glib.mem_alloc {
	if (n_bytes > 32000000)
		printf ("g_malloc: pid=%d n_bytes=%d\n", pid(), n_bytes);
}

Then start systemtap:

# stap -v ./qemu-watch.stp

On another shells, start your two QEMUs (migration source and destination)

Then act as in comment #3.

After the system_reset, you will see in the systemtap window:
...
Pass 5: starting run.
g_malloc: pid=14403 n_bytes=33669128

With the fix applied, you should not see the "g_malloc:..." line.

Comment 7 Miroslav Rezanina 2017-02-10 09:36:28 UTC
Fix included in qemu-kvm-rhev-2.6.0-28.el7_3.5

Comment 9 xianwang 2017-02-13 05:26:57 UTC
This bug is verified pass on qemu-kvm-rhev-2.6.0-28.el7_3.5.ppc64le.

Reproduced this bug on qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le with version:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

1) install package "kernel-devel-3.10.0-558.el7.ppc64le.rpm"
2) create a script to check the oversized memory and start systemtap
[root@ibm-p8-rhevm-13 ~]# vim qemu-watch.stp
probe glib.mem_alloc {
                if (n_bytes > 32000000)
                                        printf ("g_malloc: pid=%d n_bytes=%d\n", pid(), n_bytes);
}
[root@ibm-p8-rhevm-13 ~]# stap -v ./qemu-watch.stp 
Pass 1: ...
Pass 2: ...
Pass 3: ...
Pass 4: ...
Pass 5: starting run.
3) Open a new shell in src host, boot a guest with qemu cli as following:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
    -device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0  \
    -device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 8G \
    -smp 2 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm
4) boot a guest with same qemu cli as src host and appending 
"-incoming tcp:0:5801"
5) do migration and then reset vm with following command:
(qemu) migrate -d tcp:10.19.112.39:5801
(qemu) system_reset

Actual result:
migration completed and vm work well, there's no "tcmalloc..." lines in src host but there is "g_malloc: pid=39196 n_bytes=33669136" line in src as following:
...
Pass 5: starting run.
g_malloc: pid=39196 n_bytes=33669136

Bug verified pass with following packages:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.5.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

test step is same with bug reproduction.

Result:
migration completed and vm work well, there's no "tcmalloc..." lines and no "g_malloc: pid=39196 n_bytes=33669136" line in src host.

So, this bug is fixed.

Comment 12 errata-xmlrpc 2017-03-01 08:02:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0350.html


Note You need to log in before you can comment on or make changes to this bug.