Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 746950

Summary:	spice-server: channel_event deadlock on reset + timeout
Product:	Red Hat Enterprise Linux 6	Reporter:	Alon Levy <alevy>
Component:	spice-server	Assignee:	Uri Lublin <uril>
Status:	CLOSED ERRATA	QA Contact:	Desktop QE <desktop-qa-list>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	acathrow, berrange, cfergeau, cmeadors, dblechte, ddumas, djasa, mkenneth, mkrcmari, pvine, tpelka
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	spice-server-0.8.2-5	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 15:22:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alon Levy 2011-10-18 11:34:20 UTC

Description of problem:
a system reset in qemu followed by discovering a disconnected client triggers a deadlock on channel_event since the recent added lock taking in channel_event (see bug 697441, commit 34f2a05)

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. launch qemu: qemu-system-x86_64 -spice disable-ticketing,port=1234 -vga qxl -monitor stdio
2. launch spice client (doesn't matter which): spicec -h localhost -p 1234
3. important: don't close the spice client, just background it: ctrl-z in the launching terminal of spicec
4. from qemu monitor: issue system_reset repeatedly. after about the 10th attempt, you will see the message:
(qemu) red_wait_outgoing_item: blocked
5. after a certain timeout you will see the following message:
flush_display_commands: update timeout

Actual results:
No more monitor input is possible. qemu is blocked totally.

Expected results:
client disconnect detected and event delivered, qemu continues normally.

Additional info:
Stack traces:

main thread:
(gdb) bt
#0  0x00000037ec00e5bd in read () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fbd86bda3da in receive_data (fd=10, in_buf=0x7fffd361411c, n=4) at red_worker.h:143
#2  0x00007fbd86bda45f in read_message (fd=10, message=0x7fffd361411c) at red_worker.h:156
#3  0x00007fbd86bdad6c in red_dispatcher_destroy_surfaces (dispatcher=0x2445380) at red_dispatcher.c:316
#4  0x00007fbd86bdadb5 in qxl_worker_destroy_surfaces (qxl_worker=0x2445380) at red_dispatcher.c:322
#5  0x00000000004a3cce in qxl_spice_destroy_surfaces (qxl=0x24d4840, async=QXL_SYNC) at /home/alon/src/spice_rhel
6/qemu-kvm-rhel6/hw/qxl.c:228
#6  0x00000000004a63ee in qxl_reset_surfaces (d=0x24d4840) at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/hw/qxl.c:
1008
#7  0x00000000004a5c87 in qxl_hard_reset (d=0x24d4840, loadvm=0) at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/hw/
qxl.c:878
#8  0x00000000004a5d41 in qxl_reset_handler (dev=0x24d4840) at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/hw/qxl.c
:896
#9  0x00000000005109bc in qdev_reset (opaque=0x24d4840) at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/hw/qdev.c:27
1
#10 0x000000000040da35 in qemu_system_reset () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/vl.c:3259
#11 0x0000000000438c16 in qemu_kvm_system_reset () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/qemu-kvm.c:1934
#12 0x0000000000439404 in kvm_main_loop () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/qemu-kvm.c:2206
#13 0x000000000040e7aa in main_loop () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/vl.c:4095
#14 0x00000000004129b7 in main (argc=7, argv=0x7fffd3614728, envp=0x7fffd3614768) at /home/alon/src/spice_rhel6/q
emu-kvm-rhel6/vl.c:6323

(This is less important but just a reminder that the whole timeout handling out of the main event loop is wrong)
red_worker thread before timeout:
(gdb) bt
#0  0x00000037ebcbc17d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#1  0x00000037ebce9f94 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:33
#2  0x00007fbd86c00e0f in flush_display_commands (worker=0x7fbd84c7fbd0) at red_worker.c:8769
#3  0x00007fbd86c00f88 in flush_all_qxl_commands (worker=0x7fbd84c7fbd0) at red_worker.c:8820
#4  0x00007fbd86c0389f in handle_dev_destroy_surfaces (worker=0x7fbd84c7fbd0) at red_worker.c:9818
#5  0x00007fbd86c041d7 in handle_dev_input (listener=0x7fbd84c7fbd0, events=1) at red_worker.c:10013
#6  0x00007fbd86c04de2 in red_worker_main (arg=0x7fffd3613e70) at red_worker.c:10304

red worker thread after timeout:
#3  0x0000000000439ec3 in kvm_mutex_lock () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/qemu-kvm.c:2610
#4  0x0000000000439ef8 in qemu_mutex_lock_iothread () at /home/alon/src/spice_rhel6/qemu-kvm-rhel6/qemu-kvm.c:262
3
#5  0x00000000005168cd in channel_event (event=3, info=0x2946dd0) at ui/spice-core.c:253
#6  0x00007f5139ba41d6 in reds_channel_event (stream=0x2946d50, event=3) at reds.c:418
#7  0x00007f5139bb0377 in reds_stream_free (s=0x2946d50) at reds.c:5280
#8  0x00007f5139bc459a in red_disconnect_channel (channel=0x7f5118069970) at red_worker.c:8489
#9  0x00007f5139bc4727 in red_disconnect_display (channel=0x7f5118069970) at red_worker.c:8524
#10 0x00007f5139bc4dff in flush_display_commands (worker=0x7f5136c42bd0) at red_worker.c:8766
#11 0x00007f5139bc4f88 in flush_all_qxl_commands (worker=0x7f5136c42bd0) at red_worker.c:8820
#12 0x00007f5139bc789f in handle_dev_destroy_surfaces (worker=0x7f5136c42bd0) at red_worker.c:9818
#13 0x00007f5139bc81d7 in handle_dev_input (listener=0x7f5136c42bd0, events=1) at red_worker.c:10013
#14 0x00007f5139bc8de2 in red_worker_main (arg=0x7fff7e21b780) at red_worker.c:10304

There are patches to handle this on the mailing list but they need revising: http://lists.freedesktop.org/archives/spice-devel/2011-September/005369.html

Comment 5 Alon Levy 2011-10-23 18:01:31 UTC

fixed with this brew build, verified using the procedure in the bug description.

 https://brewweb.devel.redhat.com/taskinfo?taskID=3734350

Alon

Comment 8 Alon Levy 2011-10-25 12:26:49 UTC

commit pushed to 0.8 branch in upstream.

Comment 9 Alon Levy 2011-10-26 21:27:56 UTC

Hmm, I was just going to complain that my brew build didn't enter the RHEL-6.2-candidate group/tag, but I guess it just took some time. There it is, moving to MODIFIED.

Comment 11 Marian Krcmarik 2011-10-27 15:00:54 UTC

I've tried to verify according to steps, however with spice-server-0.8.2-5 I can get:
(qemu) system_reset
(qemu) red_wait_outgoing_item: blocked
#For about 2 minutes, qemu monitor is not responsive
flush_cursor_commands: flush cursor timeout
(qemu)
#Qemu monitor is responsive again

Is this expected correct behaviour? (Just making sure), Thanks.

Comment 12 Alon Levy 2011-10-28 14:30:25 UTC

(In reply to comment #11)
> I've tried to verify according to steps, however with spice-server-0.8.2-5 I
> can get:
> (qemu) system_reset
> (qemu) red_wait_outgoing_item: blocked
> #For about 2 minutes, qemu monitor is not responsive
> flush_cursor_commands: flush cursor timeout
> (qemu)
> #Qemu monitor is responsive again
> 
> Is this expected correct behaviour? (Just making sure), Thanks.

These are two seperate issues. What you are seeing is a timeout entirely in spice-server that has not been addressed, it's actually part of the solution to a possible temporary delay (albiet not optimal). After the timeout the disconnection happens (you'll see the remaining channels are still there with info spice before and after).

So in short yes, this is the expected correct behavior, and another non blocker bz is probably in order.

Alon

Comment 14 errata-xmlrpc 2011-12-06 15:22:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1634.html

Comment 15 David Jaša 2012-03-05 17:28:39 UTC

After a consultation with Marian, we've decided that no ongoing testing is needed.