Bug 994211

Summary:	Spice: fallback from async io mode to sync mode might result in errors and hanging vm
Product:	Red Hat Enterprise Linux 8	Reporter:	Yonit Halperin <yhalperi>
Component:	spice-qxl-xddm	Assignee:	Uri Lublin <uril>
Status:	CLOSED CURRENTRELEASE	QA Contact:	SPICE QE bug list <spice-qe-bugs>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	---	CC:	bin.liu87, cfergeau, dblechte, marcandre.lureau, pvine, rbalakri, srevivo, tpelka, uril
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-06-28 14:17:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Spice	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yonit Halperin 2013-08-06 18:23:29 UTC

Description of problem:

If an async io query doesn't get a reply after 60 seconds, the driver transfers from async_io mode to sync_io.

However, not all the io operations have a sync support, so we can get errors like 

qxl/guest-0: 222140443011: qxldd: set custom display FFFFF900C1E00020
qxl/guest-0: 222147232862: qxldd: DrvAssertMode: 0xc1e00020 revision 4 enable 0
qxl/guest-0: 222147251233: qxldd: ERROR: trying calling sync io on NULL port 6

for ASYNCABLE_FLUSH_SURFACES (and not flushing the surfaces can be followed by other errors).

In addition, probably since the async_io query eventually was handled by spice server, and the IO_CMD interrupt *was* sent for the expired query, we get a "guest bug" in qemu, for 2 simultaneous async ios (after the driver is reloaded and is back to async io mode).

"qxl-0: guest bug: 2 async started before last (16) complete"

Notice that after a guest_bug is identified by qemu there is another bug (in qemu side):
interface_get_command in qxl.c reports FALSE, while 
interface_req_cmd_notification returns FALSE as well (it reports that the ring is not empty).
This leads to an endless loop in spice-server red_worker.c (see https://bugzilla.redhat.com/show_bug.cgi?id=964136#c29)

I'm not sure about the severity of this bug because we shouldn't reach a query timeout in the first place.
We reached it in bug 964136, probably because by mistake we had too long timeout in spice-server when waiting for a response from a client, and from a reason we don't know yet, the client was not responsive. I'll send a patch to fix the timeout on the server side.

Related Bug: 964136

Comment 1 bin.liu 2013-09-07 07:57:36 UTC

has this patch been submitted? I have encountered this bug on qemu-kvm-0.12 and spice-server-0.12

Comment 2 Yonit Halperin 2013-09-09 15:08:06 UTC

(In reply to bin.liu from comment #1)
> has this patch been submitted? I have encountered this bug on qemu-kvm-0.12
> and spice-server-0.12

A patch for bug 995041 has been submitted. It is part of spice-server-0.12.4-3.el6

Comment 9 Sandro Bonazzola 2015-10-26 12:50:14 UTC

this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 10 Uri Lublin 2015-10-26 13:15:34 UTC

This bug is not a blocker

Comment 12 Yaniv Lavi 2016-05-09 11:03:40 UTC

oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.