Bug 1110191
| Summary: | Reduce the migrate cache size during migration causes qemu segment fault | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Libor Miksik <lmiksik> |
| Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 | CC: | acathrow, dgilbert, hhuang, huding, jherrman, juzhang, lmiksik, michen, mrezanin, pm-eus, qzhang, tdosek, virt-maint |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-1.5.3-60.el7_0.3 | Doc Type: | Bug Fix |
| Doc Text: |
Prior to this update, the QEMU command interface did not properly handle resizing of cache memory during a guest migration, causing QEMU to terminate unexpectedly with a segmentation fault and QEMU to fail. This update fixes the related code and QEMU no longer crashes in the described situation.
|
Story Points: | --- |
| Clone Of: | 1066338 | Environment: | |
| Last Closed: | 2014-07-23 16:18:26 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1066338 | ||
| Bug Blocks: | 1110706 | ||
|
Description
Libor Miksik
2014-06-17 08:11:38 UTC
Fix included in qemu-kvm-1.5.3-60.el7_0.3 Reproduce this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-48.el7.x86_64 Steps to Reproduce: 1. Boot up a guest # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 2. Boot the guest on dst host with "-incoming tcp:0:5800" 3. Running google stressapptest inside guest (Refer to Bug 1063417) Docs: https://code.google.com/p/stressapptest/wiki/Introduction (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6) (2)untar ./configure make This produces the binary src/stressapptest ** Don't run the test on your laptop - it'll run it out of memory without options ! ** (3)copy the binary onto the victim VM: scp src/stressapptest thevmname: (4) Then on a text-console on the VM do: ./stressapptest -s 3600 -m 20 -i 20 -C 20 5. On source host qemu: (qemu) migrate_set_capability auto-converge on (qemu) migrate_set_capability xbzrle on (qemu) migrate_set_cache_size 1G (qemu) migrate_set_speed 1G 6. Implement migration (qemu) migrate -d tcp:$dst_host_ip:5800 7. Wait for a while, before migration finish. (qemu) migrate_set_cache_size 128M Actual results: after step7, Qemu segment fault: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6 #1 0x00007ffff74ef9af in g_free () from /lib64/libglib-2.0.so.0 #2 0x00005555556ec291 in cache_resize () #3 0x0000555555744ab5 in xbzrle_cache_resize () #4 0x00005555556e11a5 in qmp_migrate_set_cache_size () #5 0x0000555555653a0a in hmp_migrate_set_cache_size () #6 0x000055555579efc9 in handle_user_command () #7 0x000055555579f297 in monitor_command_cb () #8 0x00005555557171f4 in readline_handle_byte () #9 0x000055555579f224 in monitor_read () #10 0x0000555555707f3b in fd_chr_read () #11 0x00007ffff74e9e06 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #12 0x00005555556dae9a in main_loop_wait () #13 0x00005555556017c0 in main () Verify this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-64.el7.x86_64 Steps to Verify: 1. Boot up a guest # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 2. Boot the guest on dst host with "-incoming tcp:0:5800" 3. Running google stressapptest inside guest (Refer to Bug 1063417) Docs: https://code.google.com/p/stressapptest/wiki/Introduction (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6) (2)untar ./configure make This produces the binary src/stressapptest ** Don't run the test on your laptop - it'll run it out of memory without options ! ** (3)copy the binary onto the victim VM: scp src/stressapptest thevmname: (4) Then on a text-console on the VM do: ./stressapptest -s 3600 -m 20 -i 20 -C 20 5. On source host qemu: (qemu) migrate_set_capability auto-converge on (qemu) migrate_set_capability xbzrle on (qemu) migrate_set_cache_size 1G (qemu) migrate_set_speed 1G 6. Implement migration (qemu) migrate -d tcp:$dst_host_ip:5800 7. Wait for a while, before migration finish. (qemu) migrate_set_cache_size 128M Actual results: after step7, qemu-kvm is not Segmentation fault, migration could be finished successfully when enlarge downtime. I do ping-pong migration for three times, migration could be finished. (In reply to huiqingding from comment #6) > Verify this bug using the following version: > kernel-3.10.0-128.el7.x86_64 > qemu-kvm-1.5.3-64.el7.x86_64 > Correction: Verify this bug using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-60.el7_0.4.x86_64 Use the following steps, reduce the migrate cache size, do ping-pong migation for three times, migation can be finished normally. > Steps to Verify: > 1. Boot up a guest > # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp > 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid > 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc > base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 > -drive > file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2, > werror=stop,rerror=stop,aio=native -device > virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive > if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device > ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev > tap,id=hostnet0,vhost=on -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0, > addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev > socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device > isa-serial,chardev=isa1,id=isa-serial1 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev > socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0, > name=com.redhat.rhevm.vdsm -chardev > socket,path=/tmp/foo,server,nowait,id=foo -device > virtconsole,chardev=foo,id=console0 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device > virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 > -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 > > 2. Boot the guest on dst host with "-incoming tcp:0:5800" > > 3. Running google stressapptest inside guest (Refer to Bug 1063417) > > Docs: > https://code.google.com/p/stressapptest/wiki/Introduction > > (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list > (I used 1.0.6) > > (2)untar > ./configure > make > This produces the binary src/stressapptest > > ** Don't run the test on your laptop - it'll run it out of memory without > options ! ** > > (3)copy the binary onto the victim VM: > scp src/stressapptest thevmname: > > (4) Then on a text-console on the VM do: > ./stressapptest -s 3600 -m 20 -i 20 -C 20 > > > 5. On source host qemu: > (qemu) migrate_set_capability auto-converge on > (qemu) migrate_set_capability xbzrle on > (qemu) migrate_set_cache_size 1G > (qemu) migrate_set_speed 1G > > 6. Implement migration > (qemu) migrate -d tcp:$dst_host_ip:5800 > > 7. Wait for a while, before migration finish. > (qemu) migrate_set_cache_size 128M > > > Actual results: > after step7, qemu-kvm is not Segmentation fault, migration could be finished > successfully when enlarge downtime. I do ping-pong migration for three > times, migration could be finished. I also test comment4 of bz1066338 using the following version: kernel-3.10.0-128.el7.x86_64 qemu-kvm-1.5.3-60.el7_0.4.x86_64 Steps: 1. Boot up a guest with cdrom attached: # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio 2. Boot the guest on destination host with listening mode 3. On both src and dst host: (qemu) migrate_set_capability xbzrle on 4. On src host: (qemu) migrate_set_cache_size 2G (qemu) migrate_set_speed 100M 5. Read cdrom inside guest #while true; do cp -r /media/RHEL_6.4\ X86_64\ boot/ /home/test; sleep 1; rm -rf /home/test; done 6. Migrate guest (qemu) migrate -d tcp:t2:5800 7. When the "xbzrle transferred" value becomes larger and larger, I change the cache size (qemu) migrate_set_cache_size 128M Result: after step7, qemu-kvm will not aborted, migration could be finished successfully when enlarge downtime. I also test ping-pong migation for three times and migration can be finished normally. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0927.html |