1110191 – Reduce the migrate cache size during migration causes qemu segment fault

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1110191 - Reduce the migrate cache size during migration causes qemu segment fault

Summary: Reduce the migrate cache size during migration causes qemu segment fault

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Virtualization Maintenance
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1066338
Blocks:	1110706
TreeView+	depends on / blocked

Reported:	2014-06-17 08:11 UTC by Libor Miksik
Modified:	2022-07-09 06:30 UTC (History)
CC List:	13 users (show)
Fixed In Version:	qemu-kvm-1.5.3-60.el7_0.3
Doc Type:	Bug Fix
Doc Text:	Prior to this update, the QEMU command interface did not properly handle resizing of cache memory during a guest migration, causing QEMU to terminate unexpectedly with a segmentation fault and QEMU to fail. This update fixes the related code and QEMU no longer crashes in the described situation.
Clone Of:	1066338
Environment:
Last Closed:	2014-07-23 16:18:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2014:0927	0	normal	SHIPPED_LIVE	Moderate: qemu-kvm security and bug fix update	2014-07-23 20:15:13 UTC

Description Libor Miksik 2014-06-17 08:11:38 UTC

This bug has been copied from bug #1066338 and has been proposed
to be backported to 7.0 z-stream (EUS).

Comment 3 Miroslav Rezanina 2014-06-17 15:37:31 UTC

Fix included in qemu-kvm-1.5.3-60.el7_0.3

Comment 5 huiqingding 2014-06-23 06:10:39 UTC

Reproduce this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-48.el7.x86_64

Steps to Reproduce:
1. Boot up a guest 
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot the guest on dst host with "-incoming tcp:0:5800"

3. Running google stressapptest inside guest (Refer to Bug 1063417)

Docs:
https://code.google.com/p/stressapptest/wiki/Introduction

(1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6)

(2)untar
./configure
make
This produces the binary src/stressapptest

** Don't run the test on your laptop - it'll run it out of memory without options ! **

(3)copy the binary onto the victim VM:
scp src/stressapptest   thevmname:

(4) Then on a text-console on the VM do:
 ./stressapptest -s 3600 -m 20 -i 20 -C 20
	

5. On source host qemu:
(qemu) migrate_set_capability auto-converge on
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 1G
(qemu) migrate_set_speed 1G

6. Implement migration
(qemu) migrate -d tcp:$dst_host_ip:5800

7. Wait for a while, before migration finish.
(qemu) migrate_set_cache_size 128M


Actual results:
after step7, Qemu segment fault:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff2cfdfaa in _int_free () from /lib64/libc.so.6
#1  0x00007ffff74ef9af in g_free () from /lib64/libglib-2.0.so.0
#2  0x00005555556ec291 in cache_resize ()
#3  0x0000555555744ab5 in xbzrle_cache_resize ()
#4  0x00005555556e11a5 in qmp_migrate_set_cache_size ()
#5  0x0000555555653a0a in hmp_migrate_set_cache_size ()
#6  0x000055555579efc9 in handle_user_command ()
#7  0x000055555579f297 in monitor_command_cb ()
#8  0x00005555557171f4 in readline_handle_byte ()
#9  0x000055555579f224 in monitor_read ()
#10 0x0000555555707f3b in fd_chr_read ()
#11 0x00007ffff74e9e06 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#12 0x00005555556dae9a in main_loop_wait ()
#13 0x00005555556017c0 in main ()

Comment 6 huiqingding 2014-06-23 06:58:11 UTC

Verify this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-64.el7.x86_64

Steps to Verify:
1. Boot up a guest 
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot the guest on dst host with "-incoming tcp:0:5800"

3. Running google stressapptest inside guest (Refer to Bug 1063417)

Docs:
https://code.google.com/p/stressapptest/wiki/Introduction

(1) Get the code from: http://code.google.com/p/stressapptest/downloads/list (I used 1.0.6)

(2)untar
./configure
make
This produces the binary src/stressapptest

** Don't run the test on your laptop - it'll run it out of memory without options ! **

(3)copy the binary onto the victim VM:
scp src/stressapptest   thevmname:

(4) Then on a text-console on the VM do:
 ./stressapptest -s 3600 -m 20 -i 20 -C 20
	

5. On source host qemu:
(qemu) migrate_set_capability auto-converge on
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 1G
(qemu) migrate_set_speed 1G

6. Implement migration
(qemu) migrate -d tcp:$dst_host_ip:5800

7. Wait for a while, before migration finish.
(qemu) migrate_set_cache_size 128M


Actual results:
after step7, qemu-kvm is not Segmentation fault, migration could be finished successfully when enlarge downtime. I do ping-pong migration for three times, migration could be finished.

Comment 7 huiqingding 2014-06-23 08:43:25 UTC

(In reply to huiqingding from comment #6)
> Verify this bug using the following version:
> kernel-3.10.0-128.el7.x86_64
> qemu-kvm-1.5.3-64.el7.x86_64
> 
Correction:
Verify this bug using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-60.el7_0.4.x86_64

Use the following steps, reduce the migrate cache size, do ping-pong migation for three times, migation can be finished normally.
 
> Steps to Verify:
> 1. Boot up a guest 
> # /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 30G -smp
> 8,sockets=1,cores=8,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid
> 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc
> base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0
> -drive
> file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,
> werror=stop,rerror=stop,aio=native -device
> virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive
> if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev
> tap,id=hostnet0,vhost=on -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,
> addr=0x5 -monitor stdio -qmp tcp:0:6666,server,nowait -chardev
> socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device
> isa-serial,chardev=isa1,id=isa-serial1 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev
> socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,
> name=com.redhat.rhevm.vdsm -chardev
> socket,path=/tmp/foo,server,nowait,id=foo -device
> virtconsole,chardev=foo,id=console0 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c
> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device
> virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0
> -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
> 
> 2. Boot the guest on dst host with "-incoming tcp:0:5800"
> 
> 3. Running google stressapptest inside guest (Refer to Bug 1063417)
> 
> Docs:
> https://code.google.com/p/stressapptest/wiki/Introduction
> 
> (1) Get the code from: http://code.google.com/p/stressapptest/downloads/list
> (I used 1.0.6)
> 
> (2)untar
> ./configure
> make
> This produces the binary src/stressapptest
> 
> ** Don't run the test on your laptop - it'll run it out of memory without
> options ! **
> 
> (3)copy the binary onto the victim VM:
> scp src/stressapptest   thevmname:
> 
> (4) Then on a text-console on the VM do:
>  ./stressapptest -s 3600 -m 20 -i 20 -C 20
> 	
> 
> 5. On source host qemu:
> (qemu) migrate_set_capability auto-converge on
> (qemu) migrate_set_capability xbzrle on
> (qemu) migrate_set_cache_size 1G
> (qemu) migrate_set_speed 1G
> 
> 6. Implement migration
> (qemu) migrate -d tcp:$dst_host_ip:5800
> 
> 7. Wait for a while, before migration finish.
> (qemu) migrate_set_cache_size 128M
> 
> 
> Actual results:
> after step7, qemu-kvm is not Segmentation fault, migration could be finished
> successfully when enlarge downtime. I do ping-pong migration for three
> times, migration could be finished.

Comment 8 huiqingding 2014-06-23 09:07:15 UTC

I also test comment4 of bz1066338 using the following version:
kernel-3.10.0-128.el7.x86_64
qemu-kvm-1.5.3-60.el7_0.4.x86_64

Steps:

1. Boot up a guest with cdrom attached:
# /usr/libexec/qemu-kvm -cpu SandyBridge -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -enable-kvm -name t2-rhel6.4-64 -uuid 61b6c504-5a8b-4fe1-8347-6c929b750dde -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=input0 -drive file=/mnt/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=disk0,id=disk0 -drive file=/mnt/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,drive=drive-ide0-1-0,bus=ide.1,unit=0,id=cdrom -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:91:85,bus=pci.0,addr=0x5 -qmp tcp:0:6666,server,nowait -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/tmp/serial-socket,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,path=/tmp/foo,server,nowait,id=foo -device virtconsole,chardev=foo,id=console0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -vnc :10 -k en-us -boot c -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio

2. Boot the guest on destination host with listening mode

3. On both src and dst host:
(qemu) migrate_set_capability xbzrle on 

4. On src host:
(qemu) migrate_set_cache_size 2G
(qemu) migrate_set_speed 100M 

5. Read cdrom inside guest
#while true; do cp -r /media/RHEL_6.4\ X86_64\ boot/ /home/test; sleep 1; rm -rf /home/test; done

6. Migrate guest
(qemu) migrate -d tcp:t2:5800

7. When the  "xbzrle transferred" value becomes larger and larger, I change the cache size

(qemu) migrate_set_cache_size 128M


Result:
after step7, qemu-kvm will not aborted, migration could be finished successfully when enlarge downtime. I also test ping-pong migation for three times and migration can be finished normally.

Comment 11 errata-xmlrpc 2014-07-23 16:18:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0927.html

Note You need to log in before you can comment on or make changes to this bug.