Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1110191 - Reduce the migrate cache size during migration causes qemu segment fault
Reduce the migrate cache size during migration causes qemu segment fault
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Virtualization Maintenance
Virtualization Bugs
: ZStream
Depends On: 1066338
Blocks: 1110706
  Show dependency treegraph
 
Reported: 2014-06-17 04:11 EDT by Libor Miksik
Modified: 2014-07-23 12:18 EDT (History)
13 users (show)

See Also:
Fixed In Version: qemu-kvm-1.5.3-60.el7_0.3
Doc Type: Bug Fix
Doc Text:
Prior to this update, the QEMU command interface did not properly handle resizing of cache memory during a guest migration, causing QEMU to terminate unexpectedly with a segmentation fault and QEMU to fail. This update fixes the related code and QEMU no longer crashes in the described situation.
Story Points: ---
Clone Of: 1066338
Environment:
Last Closed: 2014-07-23 12:18:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0927 normal SHIPPED_LIVE Moderate: qemu-kvm security and bug fix update 2014-07-23 16:15:13 EDT

  None (edit)
Description Libor Miksik 2014-06-17 04:11:38 EDT
This bug has been copied from bug #1066338 and has been proposed
to be backported to 7.0 z-stream (EUS).
Comment 3 Miroslav Rezanina 2014-06-17 11:37:31 EDT
Fix included in qemu-kvm-1.5.3-60.el7_0.3
Comment 5 huiqingding 2014-06-23 02:10:39 EDT
Reproduce this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-48.el7.x86_64

Steps to Reproduce:
1. Boot up a guest 
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot the guest on dst host with "-incoming tcp:0:5800"

3. Running google stressapptest inside guest (Refer to Bug 1063417)

Docs:
https://code.google.com/p/stressapptest/wiki/Introduction

(1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6)

(2)untar
./configure
make
This produces the binary src/stressapptest

** Don't run the test on your laptop - it'll run it out of memory without options ! **

(3)copy the binary onto the victim VM:
scp src/stressapptest   thevmname:

(4) Then on a text-console on the VM do:
 ./stressapptest -s 3600 -m 20 -i 20 -C 20
	

5. On source host qemu:
(qemu) migrate_set_capability auto-converge on
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 1G
(qemu) migrate_set_speed 1G

6. Implement migration
(qemu) migrate -d tcp:$dst_host_ip:5800

7. Wait for a while, before migration finish.
(qemu) migrate_set_cache_size 128M


Actual results:
after step7, Qemu segment fault:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6
#1  0x00007ffff74ef9af in g_free () from /lib64/libglib-2.0.so.0
#2  0x00005555556ec291 in cache_resize ()
#3  0x0000555555744ab5 in xbzrle_cache_resize ()
#4  0x00005555556e11a5 in qmp_migrate_set_cache_size ()
#5  0x0000555555653a0a in hmp_migrate_set_cache_size ()
#6  0x000055555579efc9 in handle_user_command ()
#7  0x000055555579f297 in monitor_command_cb ()
#8  0x00005555557171f4 in readline_handle_byte ()
#9  0x000055555579f224 in monitor_read ()
#10 0x0000555555707f3b in fd_chr_read ()
#11 0x00007ffff74e9e06 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#12 0x00005555556dae9a in main_loop_wait ()
#13 0x00005555556017c0 in main ()
Comment 6 huiqingding 2014-06-23 02:58:11 EDT
Verify this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-64.el7.x86_64

Steps to Verify:
1. Boot up a guest 
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot the guest on dst host with "-incoming tcp:0:5800"

3. Running google stressapptest inside guest (Refer to Bug 1063417)

Docs:
https://code.google.com/p/stressapptest/wiki/Introduction

(1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6)

(2)untar
./configure
make
This produces the binary src/stressapptest

** Don't run the test on your laptop - it'll run it out of memory without options ! **

(3)copy the binary onto the victim VM:
scp src/stressapptest   thevmname:

(4) Then on a text-console on the VM do:
 ./stressapptest -s 3600 -m 20 -i 20 -C 20
	

5. On source host qemu:
(qemu) migrate_set_capability auto-converge on
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 1G
(qemu) migrate_set_speed 1G

6. Implement migration
(qemu) migrate -d tcp:$dst_host_ip:5800

7. Wait for a while, before migration finish.
(qemu) migrate_set_cache_size 128M


Actual results:
after step7, qemu-kvm is not Segmentation fault, migration could be finished successfully when enlarge downtime. I do ping-pong migration for three times, migration could be finished.
Comment 7 huiqingding 2014-06-23 04:43:25 EDT
(In reply to huiqingding from comment #6)
> Verify this bug using the following version:
> kernel-3.10.0-128.el7.x86_64
> qemu-kvm-1.5.3-64.el7.x86_64
> 
Correction:
Verify this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-60.el7_0.4.x86_64

Use the following steps, reduce the migrate cache size, do ping-pong migation for three times, migation can be finished normally.
 
> Steps to Verify:
> 1. Boot up a guest 
> # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp
> 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid
> 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc
> base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0
> -drive
> file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,
> werror=stop,rerror=stop,aio=native -device
> virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive
> if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev
> tap,id=hostnet0,vhost=on -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,
> addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev
> socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device
> isa-serial,chardev=isa1,id=isa-serial1 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev
> socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,
> name=com.redhat.rhevm.vdsm -chardev
> socket,path=/tmp/foo,server,nowait,id=foo -device
> virtconsole,chardev=foo,id=console0 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c
> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device
> virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0
> -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
> 
> 2. Boot the guest on dst host with "-incoming tcp:0:5800"
> 
> 3. Running google stressapptest inside guest (Refer to Bug 1063417)
> 
> Docs:
> https://code.google.com/p/stressapptest/wiki/Introduction
> 
> (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list
> (I used 1.0.6)
> 
> (2)untar
> ./configure
> make
> This produces the binary src/stressapptest
> 
> ** Don't run the test on your laptop - it'll run it out of memory without
> options ! **
> 
> (3)copy the binary onto the victim VM:
> scp src/stressapptest   thevmname:
> 
> (4) Then on a text-console on the VM do:
>  ./stressapptest -s 3600 -m 20 -i 20 -C 20
> 	
> 
> 5. On source host qemu:
> (qemu) migrate_set_capability auto-converge on
> (qemu) migrate_set_capability xbzrle on
> (qemu) migrate_set_cache_size 1G
> (qemu) migrate_set_speed 1G
> 
> 6. Implement migration
> (qemu) migrate -d tcp:$dst_host_ip:5800
> 
> 7. Wait for a while, before migration finish.
> (qemu) migrate_set_cache_size 128M
> 
> 
> Actual results:
> after step7, qemu-kvm is not Segmentation fault, migration could be finished
> successfully when enlarge downtime. I do ping-pong migration for three
> times, migration could be finished.
Comment 8 huiqingding 2014-06-23 05:07:15 EDT
I also test comment4 of bz1066338 using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-60.el7_0.4.x86_64

Steps:

1. Boot up a guest with cdrom attached:
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio

2. Boot the guest on destination host with listening mode

3. On both src and dst host:
(qemu) migrate_set_capability xbzrle on 

4. On src host:
(qemu) migrate_set_cache_size 2G
(qemu) migrate_set_speed 100M 

5. Read cdrom inside guest
#while true; do cp -r /media/RHEL_6.4\ X86_64\ boot/ /home/test; sleep 1; rm -rf /home/test; done

6. Migrate guest
(qemu) migrate -d tcp:t2:5800

7. When the  "xbzrle transferred" value becomes larger and larger, I change the cache size

(qemu) migrate_set_cache_size 128M


Result:
after step7, qemu-kvm will not aborted, migration could be finished successfully when enlarge downtime. I also test ping-pong migation for three times and migration can be finished normally.
Comment 11 errata-xmlrpc 2014-07-23 12:18:26 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0927.html

Note You need to log in before you can comment on or make changes to this bug.