Bug 867405

Summary:	core dump when starting qemu with spice and -S
Product:	Red Hat Enterprise Linux 6	Reporter:	Sibiao Luo <sluo>
Component:	spice-server	Assignee:	Alon Levy <alevy>
Status:	CLOSED ERRATA	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.4	CC:	acathrow, alevy, areis, bsarathy, cfergeau, chayang, dblechte, dyasny, flang, juzhang, kraxel, marcandre.lureau, mbarta, michen, mkenneth, mkrcmari, pbonzini, qzhang, sandmann, sluo, uril, virt-maint, xufango, xwei
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	spice-server-0.12.0-3	Doc Type:	Bug Fix
Doc Text:	No documentation is needed. This bug is related to a new feature that enables guest-client capabilities negotiation/sharing. It was found and fixed during RHEL-6.4 development phase. Cause: Consequence: Fix: Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 10:03:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	869958, 881827

Description Sibiao Luo 2012-10-17 13:00:06 UTC

Description of problem:
boot a guest with a data disk on iSCSI storage, and do I/O to the data disk, then firewall the iSCSI port with iptables and generate EIO with wrong I/O error prompts. then stop the iptables and resume the VM via 'cont' in QEMU monitor, it fail to resume the VM, after a while, it will core dump that relate to the qxl.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
2.6.32-331.el6.x86_64
qemu-kvm-0.12.1.2-2.327.el6.x86_64
guest info:
# uname -r
2.6.32-331.el6.x86_64

How reproducible:
some times

Steps to Reproduce:
1.prepare a guest image/data disk and iSCSI server.
# qemu-img create -f qcow2 /mnt/my-data-disk.qcow2 10G
# iscsiadm --mode discovery --type sendtargets --portal 10.66.6.78 –discover
# iscsiadm -m node -T test4 -p 10.66.6.78 -l
Logging in to [iface: default, target: test4, portal: 10.66.6.78,3260] (multiple)

Login to [iface: default, target: test4, portal: 10.66.6.78,3260] successful.
...
2.boot a guest with a data disk on iSCSI storage.
eg: # /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo_migration -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/mnt/RHEL-Server-6.3-64-sluo.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,fd=6 6<>/dev/tap6 -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=08:2E:5F:0A:0D:B1,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=off -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -nodefaults -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -drive file=/mnt/my-data-disk.qcow2,if=none,id=data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=data-disk,id=sluo-disk,bus=pci.0,addr=0x9 -boot menu=on -monitor stdio
3.mkfs.ext4 to the data disk and mount it to /mnt.
# mkfs.ext4 /dev/vdb
# mount /dev/vdb /mnt/
4.do dd in loop.
# while true; do dd if=/dev/zero of=/mnt/sluo bs=1M count=100; done
5.firewall the iSCSI port in host.
# iptables -A INPUT -p tcp -s $ip_addr --dport $port -j DROP
6.stop the iptables and resume the VM via 'cont' in QEMU monitor.
# service iptables stop
(QEMU) cont
  
Actual results:
after the step5, generate EIO as following (bug 867401):
(qemu) block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
...
after the step6, if fail to 
(qemu) info status 
VM status: paused (io-error)
(qemu) cont
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)
block I/O error in device 'data-disk': No space left on device (28)

if fail to resume the VM, and after a while will core dump, i will paste later. 

Expected results:
after the step5, generate EIO as following:
(qemu) block I/O error in device 'data-disk': Input/output error (5)
block I/O error in device 'data-disk': Input/output error (5)
block I/O error in device 'data-disk': Input/output error (5)
...
after the step6, it should resume the VM successfully without core dump.

Additional info:

Comment 1 Sibiao Luo 2012-10-17 13:01:29 UTC

(In reply to comment #0)
> Description of problem:
> boot a guest with a data disk on iSCSI storage, and do I/O to the data disk,
> then firewall the iSCSI port with iptables and generate EIO with wrong I/O
> error prompts. then stop the iptables and resume the VM via 'cont' in QEMU
> monitor, it fail to resume the VM, after a while, it will core dump that
> relate to the qxl.
> 
> if fail to resume the VM, and after a while will core dump, i will paste
> later. 
> 
(qemu) 
__spice_char_device_write_buffer_get: token violation: dev 0x7fffd4000960 client 0x7ffff8ae2f40
spice_char_device_handle_client_overflow: dev 0x7fffd4000960 client 0x7ffff8a10ed0 
red_client_destroy: destroy client with #channels 6
red_channel_client_disconnect: 0x7ffff9ed34c0 (channel 0x7ffff89c6ab0 type 3 id 0)
red_channel_client_disconnect: 0x7ffff9ed34c0 (channel 0x7ffff89c6ab0 type 3 id 0)
red_channel_client_disconnect: 0x7ffff9ecf2d0 (channel 0x7ffff8a5b640 type 5 id 0)
snd_channel_put: sound channel freed
red_channel_client_disconnect: 0x7ffff9ecf2d0 (channel 0x7ffff8a5b640 type 5 id 0)
red_dispatcher_disconnect_display_peer: 
qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:1658: qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe63cb700 (LWP 3810)]
0x00007ffff574c8a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff574c8a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff574e085 in abort () from /lib64/libc.so.6
#2  0x00007ffff5745a1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff5745ae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff7f73d3d in qxl_send_events (d=0x7ffff9e1a320, events=16) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1658
#5  0x00007ffff5f6b6a7 in handle_dev_display_disconnect (opaque=<value optimized out>, payload=<value optimized out>) at red_worker.c:11236
#6  0x00007ffff5f63cc7 in dispatcher_handle_single_read (dispatcher=0x7ffff9e2d9e8) at dispatcher.c:139
#7  dispatcher_handle_recv_read (dispatcher=0x7ffff9e2d9e8) at dispatcher.c:162
#8  0x00007ffff5f8488e in red_worker_main (arg=<value optimized out>) at red_worker.c:11782
#9  0x00007ffff7740851 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff580167d in clone () from /lib64/libc.so.6
(gdb)

Comment 2 Sibiao Luo 2012-10-17 13:03:32 UTC

cpuinfo as following:
processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6784.19
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 3 Sibiao Luo 2012-10-17 13:05:15 UTC

(In reply to comment #0)
> 
> Version-Release number of selected component (if applicable):
> host info:
> # uname -r && rpm -q qemu-kvm
> 2.6.32-331.el6.x86_64
> qemu-kvm-0.12.1.2-2.327.el6.x86_64
# rpm -qa | grep qxl
xorg-x11-drv-qxl-0.0.14-13.el6_2.x86_64

Comment 5 juzhang 2012-10-18 02:17:22 UTC

FYI
https://bugzilla.redhat.com/show_bug.cgi?id=807541

Comment 6 Qunfang Zhang 2012-10-18 09:12:53 UTC

Meet this problem when try to connect a guest desktop with 'spicec' command, when the guest boot up with spice+qxl and -S.
Test with the 6.3 released version, have not this issue. So this is a regression.

Steps:
1. Boot a guest with "-S -spice port=5930,disable-ticketing -global qxl-vga.vram_size=33554432= -vga qxl".
2. spicec -h ** -p 5930

Result:
Aborted. same log with bug description.

Comment 7 Sibiao Luo 2012-10-19 05:29:35 UTC

(In reply to comment #6)
> Meet this problem when try to connect a guest desktop with 'spicec' command,
> when the guest boot up with spice+qxl and -S.
> Test with the 6.3 released version, have not this issue. So this is a
> regression.
> 
> Steps:
> 1. Boot a guest with "-S -spice port=5930,disable-ticketing -global
> qxl-vga.vram_size=33554432= -vga qxl".
> 2. spicec -h ** -p 5930
> 
Good, that's a easy method to reproduce, i use 'remote-viewer spice://$ip_addr:$port' to connect guest that also hit this issue.
bonzini suggest that try to reproduce it without i/o (just stop/cont), but i didnot reproduce it.

eg:...-S -spice port=5931,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864
# remote-viewer spice://10.66.9.242:5931

(qemu) main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 0.102000 ms, bitrate 10666666666 bps (10172.526041 Mbps)
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/qxl.c:1658: qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff574c8a5 in raise () from /lib64/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007ffff574c8a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff574e085 in abort () from /lib64/libc.so.6
#2  0x00007ffff5745a1e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff5745ae0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff7f73d3d in qxl_send_events (d=0x7ffff9e1a320, events=16) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/qxl.c:1658
#5  0x00007ffff5f837a2 in handle_new_display_channel (opaque=0x7ffec40008c0, payload=0x7ffec41d80a0) at red_worker.c:10370
#6  handle_dev_display_connect (opaque=0x7ffec40008c0, payload=0x7ffec41d80a0) at red_worker.c:11216
#7  0x00007ffff5f63cc7 in dispatcher_handle_single_read (dispatcher=0x7ffff8a24ca8) at dispatcher.c:139
#8  dispatcher_handle_recv_read (dispatcher=0x7ffff8a24ca8) at dispatcher.c:162
#9  0x00007ffff5f8488e in red_worker_main (arg=<value optimized out>) at red_worker.c:11782
#10 0x00007ffff7740851 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff580167d in clone () from /lib64/libc.so.6
(gdb)

Comment 8 Gerd Hoffmann 2012-10-22 10:56:09 UTC

Bisected down:

commit f11ac1cace1097d0ed8778472028b6f9292d2766
Author: Soren Sandmann <ssp>
Date:   Wed Oct 10 19:15:44 2012 +0200

    qxl: Add set_client_capabilities() interface to QXLInterface

Comment 9 Gerd Hoffmann 2012-10-22 11:58:37 UTC

IMO this is a spice-server bug.  Spice server must not call QXLInterface callbacks if the guest isn't running (i.e. before spice_server_vm_start is called by qemu).

Comment 10 Sibiao Luo 2012-10-23 06:29:42 UTC

Hi all,
  
   I have tried the rhel6.3GA host (kernel-2.6.32-279.el6.x86_64 and qemu-kvm-0.12.1.2-2.295.el6.x86_64) for many times, the QEMU did not core dump when boot guest with spice + qxl and -S.
eg: ...-spice port=5932,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864

host info:
# rpm -qa | grep qxl
xorg-x11-drv-qxl-0.0.14-13.el6_2.x86_64
# uname -r && rpm -q qemu-kvm
2.6.32-279.el6.x86_64
qemu-kvm-0.12.1.2-2.295.el6.x86_64
guest info:
# uname -r
2.6.32-279.el6.x86_64

   So, this issue is regression bug, we'd better fix it in 6.4.

Best Regards.
sluo

Comment 11 Alon Levy 2012-11-15 10:37:58 UTC

Fixed upstream in:

commit 4ca54e596f81ae8ea914b83d3a6bd9df55cd777f
Author: Alon Levy <alevy>
Date:   Thu Nov 1 12:30:25 2012 +0200

    server/red_worker: don't call set_client_capabilities if vm is stopped
    
    We try to inject an interrupt to the vm in this case, which we cannot do
    if it is stopped. Instead log this and update when vm restarts.
    
    RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972
     (that bz is on qemu, it will be cloned or just changed, not
      sure yet)

Waiting for flags to push.

Comment 12 Marc-Andre Lureau 2012-11-15 17:46:03 UTC

*** Bug 869945 has been marked as a duplicate of this bug. ***

Comment 13 Marc-Andre Lureau 2012-11-15 18:03:58 UTC

(moving to POST since patch is already in rhel package)

Comment 14 Alon Levy 2012-11-19 12:17:30 UTC

scratch build here.

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=5089166

I'll go ahead and push this, meanwhile some testing would be good. (tested locally).

Comment 15 Alon Levy 2012-11-19 12:36:30 UTC

Built: spice-server-0.12.0-3.el6

Comment 16 Marc-Andre Lureau 2012-11-19 13:36:29 UTC

(In reply to comment #14)
> scratch build here.
> 
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=5089166
> 
> I'll go ahead and push this, meanwhile some testing would be good. (tested
> locally).

patch tested successfully by me (not the scratchbuild)

Comment 19 Christophe Fergeau 2012-11-23 19:05:49 UTC

The 0008-server-red_worker-don-t-call-set_client_capabilities.patch patch causes qemu -spice port=5900 to segfault at startup. spice-server-0.12.0-1 and a scratch build based on spice-server-0.12.0-4 with just this patch disabled work as expected ( scratch build at https://brewweb.devel.redhat.com/taskinfo?taskID=5123484 ). Should I move this bug back to ASSIGNED or open a new one?

Comment 21 errata-xmlrpc 2013-02-21 10:03:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0529.html

Comment 22 Xu Fang 2013-08-30 09:09:34 UTC

__spice_char_device_write_buffer_get: token violation: dev 0x7fffd4000960 client 0x7ffff8ae2f40

it is because  :

if (!migrated_data_tokens && dev_client->do_flow_control) {
     dev_client->num_client_tokens--;
}

and num_client_tokens decrease to 0.

and 

if (!migrated_data_tokens &&
    dev_client->do_flow_control && !dev_client->num_client_tokens) 
{
      spice_printerr("token violation: dev %p client %p", dev, client);
      spice_char_device_handle_client_overflow(dev_client);
      goto error;
}


should fixed like this:

if (dev->running && !migrated_data_tokens && dev_client->do_flow_control) {
   dev_client->num_client_tokens--;
}