Bug 674055

Summary:	non reproducable abort on spice-server: PANIC_ON(!worker->surfaces.surfaces[surface_id].context.canvas)
Product:	Red Hat Enterprise Linux 6	Reporter:	Alon Levy <alevy>
Component:	spice-server	Assignee:	Alon Levy <alevy>
Status:	CLOSED DUPLICATE	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.1	CC:	dblechte, djasa, hdegoede, hellolwq, mhasko, mkenneth
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-03-20 12:17:37 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alon Levy 2011-01-31 13:47:04 UTC

Description of problem:
spice-server abort at red_worker.c:handle_dev_destroy_primary_surface

Reporting because I think the problem is real. But could not reproduce so far.

The actual problem: we are accessing red_dispatcher from two threads, and it was never designed for that. This can happen like seen in the stack trace below, first user is a vga timer callback from main thread, and the second is a qxl io handler from vcpu thread. They both call red_dispatcher, which writes to the same pipe. So either protect all calls to red_dispatcher with a mutex (independent from qemu_iothread_lock), or serialize them using a per vcpu pipe to main thread, who will be the only user of red_dispatcher.

Version-Release number of selected component (if applicable):

How reproducible:
0% so far

Steps to Reproduce:
1. Boot up a winxp guest, crash during driver initialization.

Actual results:
crash

Expected results:
no crash

Additional info:

I got a panic in handle_dev_destroy_primary_surface, red_dispatcher.c

At the moment of panic there was an inconsistent status wrt qxl thinking it's in NATIVE mode in one thread and VGA in another. The main_loop had a timer triggered vga refresh leading to a call to qemu_spice_destroy_host_primary (because of vga_draw_text calling qemu_console_resize), which is only possible if qxl0->mode==QXL_MODE_VGA.

Otoh, an io triggered qxl_create_guest_primary, which is called after qxl0->mode is set to QXL_MODE_NATIVE, and after ensuring we exit the vga state with qxl_exit_vga_mode.

This is running a F14 guest. I couldn't recreate it since (running several times, this is after it did happen again once which was enough to run under a debugger).

more complete stack traces:

kvm_main_loop_cpu:
...
kvm_handle_io
...
qxl_create_guest_primary

kvm_main_loop:
...(timer)...
gui_update
dpy_refresh
display_refresh
qemu_spice_display_refresh
vga_hw_update
qxl_hw_update
vga_update_display
vga_draw_text
qemu_console_resize
dpy_resize
display_resize
qemu_spice_display_resize
qemu_spice_destroy_host_primary

red_worker:
handle_dev_destroy_primary_surface
PANIC_ON(!worker->surfaces.surfaces[surface_id].context.canvas)

Comment 2 Uri Lublin 2011-01-31 16:46:53 UTC

Alon, please try to reproduce with a smp (4 or 8 vcpus) guest.

Comment 3 Alon Levy 2011-03-20 12:09:55 UTC

*** Bug 680114 has been marked as a duplicate of this bug. ***

Comment 4 Alon Levy 2011-03-20 12:17:37 UTC

This situation is prevented by the latest locking fixes to bug 678208. I'm marking as duplicate because this is fixed by the same solution, but the bug is actually a different case - here it's an assert caused by dropping the global qemu mutex in the vcpu thread, and in 678208 it's a hang caused by taking the global qemu mutex from the spice server thread.

*** This bug has been marked as a duplicate of bug 678208 ***

Comment 5 leo.liao 2012-11-15 02:02:09 UTC

I met the same condition with :
centos:
kernel:3.6.2
qemu-kvm:1.2.0
spice
qxl video driver

My VM Xp abort and I found the error message from the /var/log/libvirt/qemu/xxx.log

validate_surface: failed on 9
validate_surface: panic !worker->surfaces[surface_id].context.canvas

I have met this twice in two days.
any details are needed?