Red Hat Bugzilla – Bug 1110191
Reduce the migrate cache size during migration causes qemu segment fault
Last modified: 2014-07-23 12:18:26 EDT
This bug has been copied from bug #1066338 and has been proposed to be backported to 7.0 z-stream (EUS).
Fix included in qemu-kvm-1.5.3-60.el7_0.3
Reproduce this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-48.el7.x86_64 Steps to Reproduce: 1. Boot up a guest # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 2. Boot the guest on dst host with "-incoming tcp:0:5800" 3. Running google stressapptest inside guest (Refer to Bug 1063417) Docs: https://code.google.com/p/stressapptest/wiki/Introduction (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6) (2)untar ./configure make This produces the binary src/stressapptest ** Don't run the test on your laptop - it'll run it out of memory without options ! ** (3)copy the binary onto the victim VM: scp src/stressapptest thevmname: (4) Then on a text-console on the VM do: ./stressapptest -s 3600 -m 20 -i 20 -C 20 5. On source host qemu: (qemu) migrate_set_capability auto-converge on (qemu) migrate_set_capability xbzrle on (qemu) migrate_set_cache_size 1G (qemu) migrate_set_speed 1G 6. Implement migration (qemu) migrate -d tcp:$dst_host_ip:5800 7. Wait for a while, before migration finish. (qemu) migrate_set_cache_size 128M Actual results: after step7, Qemu segment fault: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6 #1 0x00007ffff74ef9af in g_free () from /lib64/libglib-2.0.so.0 #2 0x00005555556ec291 in cache_resize () #3 0x0000555555744ab5 in xbzrle_cache_resize () #4 0x00005555556e11a5 in qmp_migrate_set_cache_size () #5 0x0000555555653a0a in hmp_migrate_set_cache_size () #6 0x000055555579efc9 in handle_user_command () #7 0x000055555579f297 in monitor_command_cb () #8 0x00005555557171f4 in readline_handle_byte () #9 0x000055555579f224 in monitor_read () #10 0x0000555555707f3b in fd_chr_read () #11 0x00007ffff74e9e06 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #12 0x00005555556dae9a in main_loop_wait () #13 0x00005555556017c0 in main ()
Verify this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-64.el7.x86_64 Steps to Verify: 1. Boot up a guest # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 2. Boot the guest on dst host with "-incoming tcp:0:5800" 3. Running google stressapptest inside guest (Refer to Bug 1063417) Docs: https://code.google.com/p/stressapptest/wiki/Introduction (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6) (2)untar ./configure make This produces the binary src/stressapptest ** Don't run the test on your laptop - it'll run it out of memory without options ! ** (3)copy the binary onto the victim VM: scp src/stressapptest thevmname: (4) Then on a text-console on the VM do: ./stressapptest -s 3600 -m 20 -i 20 -C 20 5. On source host qemu: (qemu) migrate_set_capability auto-converge on (qemu) migrate_set_capability xbzrle on (qemu) migrate_set_cache_size 1G (qemu) migrate_set_speed 1G 6. Implement migration (qemu) migrate -d tcp:$dst_host_ip:5800 7. Wait for a while, before migration finish. (qemu) migrate_set_cache_size 128M Actual results: after step7, qemu-kvm is not Segmentation fault, migration could be finished successfully when enlarge downtime. I do ping-pong migration for three times, migration could be finished.
(In reply to huiqingding from comment #6) > Verify this bug using the following version: > kernel-3.10.0-128.el7.x86_64 > qemu-kvm-1.5.3-64.el7.x86_64 > Correction: Verify this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-60.el7_0.4.x86_64 Use the following steps, reduce the migrate cache size, do ping-pong migation for three times, migation can be finished normally. > Steps to Verify: > 1. Boot up a guest > # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp > 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid > 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc > base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 > -drive > file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2, > werror=stop,rerror=stop,aio=native -device > virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive > if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device > ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev > tap,id=hostnet0,vhost=on -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0, > addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev > socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device > isa-serial,chardev=isa1,id=isa-serial1 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev > socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0, > name=com.redhat.rhevm.vdsm -chardev > socket,path=/tmp/foo,server,nowait,id=foo -device > virtconsole,chardev=foo,id=console0 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device > virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 > -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 > > 2. Boot the guest on dst host with "-incoming tcp:0:5800" > > 3. Running google stressapptest inside guest (Refer to Bug 1063417) > > Docs: > https://code.google.com/p/stressapptest/wiki/Introduction > > (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list > (I used 1.0.6) > > (2)untar > ./configure > make > This produces the binary src/stressapptest > > ** Don't run the test on your laptop - it'll run it out of memory without > options ! ** > > (3)copy the binary onto the victim VM: > scp src/stressapptest thevmname: > > (4) Then on a text-console on the VM do: > ./stressapptest -s 3600 -m 20 -i 20 -C 20 > > > 5. On source host qemu: > (qemu) migrate_set_capability auto-converge on > (qemu) migrate_set_capability xbzrle on > (qemu) migrate_set_cache_size 1G > (qemu) migrate_set_speed 1G > > 6. Implement migration > (qemu) migrate -d tcp:$dst_host_ip:5800 > > 7. Wait for a while, before migration finish. > (qemu) migrate_set_cache_size 128M > > > Actual results: > after step7, qemu-kvm is not Segmentation fault, migration could be finished > successfully when enlarge downtime. I do ping-pong migration for three > times, migration could be finished.
I also test comment4 of bz1066338 using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-60.el7_0.4.x86_64 Steps: 1. Boot up a guest with cdrom attached: # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio 2. Boot the guest on destination host with listening mode 3. On both src and dst host: (qemu) migrate_set_capability xbzrle on 4. On src host: (qemu) migrate_set_cache_size 2G (qemu) migrate_set_speed 100M 5. Read cdrom inside guest #while true; do cp -r /media/RHEL_6.4\ X86_64\ boot/ /home/test; sleep 1; rm -rf /home/test; done 6. Migrate guest (qemu) migrate -d tcp:t2:5800 7. When the "xbzrle transferred" value becomes larger and larger, I change the cache size (qemu) migrate_set_cache_size 128M Result: after step7, qemu-kvm will not aborted, migration could be finished successfully when enlarge downtime. I also test ping-pong migation for three times and migration can be finished normally.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0927.html