Bug 1420456

Summary: [ppc64le]reset vm when do migration, HMP in src host promp "tcmalloc: large alloc 1073872896 bytes..."
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Reznik <jreznik>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.3CC: knoel, lvivier, mrezanin, mst, mtessun, qzhang, snagar, virt-maint, xianwang, zhengtli
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-28.el7_3.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1404673 Environment:
Last Closed: 2017-03-01 08:02:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1404673    
Bug Blocks:    

Description Jaroslav Reznik 2017-02-08 17:20:33 UTC
This bug has been copied from bug #1404673 and has been proposed
to be backported to 7.3 z-stream (EUS).

Comment 3 xianwang 2017-02-09 07:03:11 UTC
Hi, Laurent,
This bug can't be reproduced in following version(rhel-7.3.z+):

Host(both src host and dst host):
distro:RHEL-7.3 Server ppc64le
3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
RHEL7.3 LE
3.10.0-558.el7.ppc64le

test steps:
(1)boot a guest in src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
    -device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0  \
    -device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 8G \
    -smp 8 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm
(2)boot a guest with same qemu cli as src host and appending 
"-incoming tcp:0:5801"
(3).do migration and then reset vm with following command:
(qemu) migrate -d tcp:$dst:$port
(qemu) system_reset

Actual result:
migration completed and vm work well, there's no "tcmalloc..." info prompt.

I have tried to test this scenario 5 times, but can't reproduced it.
So, does this bug is fixed?

Comment 4 Laurent Vivier 2017-02-09 07:58:46 UTC
There is no real way to verify this bug is fixed for rhel-7.3.z: as the size of the unnecessary memory allocation is only 32MB it doesn't trigger the tcmalloc() warning.
I've tested this having added some traces in the function, and I've seen the memory size allocated for the log has been reduced from 32MB to 64kB with this patch.

Comment 5 xianwang 2017-02-09 08:27:30 UTC
(In reply to Laurent Vivier from comment #4)
> There is no real way to verify this bug is fixed for rhel-7.3.z: as the size
> of the unnecessary memory allocation is only 32MB it doesn't trigger the
> tcmalloc() warning.
> I've tested this having added some traces in the function, and I've seen the
> memory size allocated for the log has been reduced from 32MB to 64kB with
> this patch.

Since this bug can't be reproduced for rhel-7.3.z, So, when we verify it in future, do we test the scenario same as comment 3 ?

Comment 6 Laurent Vivier 2017-02-09 09:48:22 UTC
You can use systemtap to log memory allocated by qemu.

As we know the oversized memory size is > 32MB, we can use this script to check:

$ cat qemu-watch.stp
probe glib.mem_alloc {
	if (n_bytes > 32000000)
		printf ("g_malloc: pid=%d n_bytes=%d\n", pid(), n_bytes);
}

Then start systemtap:

# stap -v ./qemu-watch.stp

On another shells, start your two QEMUs (migration source and destination)

Then act as in comment #3.

After the system_reset, you will see in the systemtap window:
...
Pass 5: starting run.
g_malloc: pid=14403 n_bytes=33669128

With the fix applied, you should not see the "g_malloc:..." line.

Comment 7 Miroslav Rezanina 2017-02-10 09:36:28 UTC
Fix included in qemu-kvm-rhev-2.6.0-28.el7_3.5

Comment 9 xianwang 2017-02-13 05:26:57 UTC
This bug is verified pass on qemu-kvm-rhev-2.6.0-28.el7_3.5.ppc64le.

Reproduced this bug on qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le with version:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.4.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

1) install package "kernel-devel-3.10.0-558.el7.ppc64le.rpm"
2) create a script to check the oversized memory and start systemtap
[root@ibm-p8-rhevm-13 ~]# vim qemu-watch.stp
probe glib.mem_alloc {
                if (n_bytes > 32000000)
                                        printf ("g_malloc: pid=%d n_bytes=%d\n", pid(), n_bytes);
}
[root@ibm-p8-rhevm-13 ~]# stap -v ./qemu-watch.stp 
Pass 1: ...
Pass 2: ...
Pass 3: ...
Pass 4: ...
Pass 5: starting run.
3) Open a new shell in src host, boot a guest with qemu cli as following:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
    -device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0  \
    -device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
    -device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
    -netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 8G \
    -smp 2 \
    -cpu host \
    -device usb-kbd \
    -device usb-tablet \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -enable-kvm
4) boot a guest with same qemu cli as src host and appending 
"-incoming tcp:0:5801"
5) do migration and then reset vm with following command:
(qemu) migrate -d tcp:10.19.112.39:5801
(qemu) system_reset

Actual result:
migration completed and vm work well, there's no "tcmalloc..." lines in src host but there is "g_malloc: pid=39196 n_bytes=33669136" line in src as following:
...
Pass 5: starting run.
g_malloc: pid=39196 n_bytes=33669136

Bug verified pass with following packages:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.5.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Guest:
3.10.0-558.el7.ppc64le

test step is same with bug reproduction.

Result:
migration completed and vm work well, there's no "tcmalloc..." lines and no "g_malloc: pid=39196 n_bytes=33669136" line in src host.

So, this bug is fixed.

Comment 12 errata-xmlrpc 2017-03-01 08:02:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0350.html