Bug 994211

Summary: Spice: fallback from async io mode to sync mode might result in errors and hanging vm
Product: Red Hat Enterprise Linux 8 Reporter: Yonit Halperin <yhalperi>
Component: spice-qxl-xddmAssignee: Uri Lublin <uril>
Status: CLOSED CURRENTRELEASE QA Contact: SPICE QE bug list <spice-qe-bugs>
Severity: low Docs Contact:
Priority: unspecified    
Version: ---CC: bin.liu87, cfergeau, dblechte, marcandre.lureau, pvine, rbalakri, srevivo, tpelka, uril
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-28 14:17:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Spice RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yonit Halperin 2013-08-06 18:23:29 UTC
Description of problem:

If an async io query doesn't get a reply after 60 seconds, the driver transfers from async_io mode to sync_io.

However, not all the io operations have a sync support, so we can get errors like 

qxl/guest-0: 222140443011: qxldd: set custom display FFFFF900C1E00020
qxl/guest-0: 222147232862: qxldd: DrvAssertMode: 0xc1e00020 revision 4 enable 0
qxl/guest-0: 222147251233: qxldd: ERROR: trying calling sync io on NULL port 6

for ASYNCABLE_FLUSH_SURFACES (and not flushing the surfaces can be followed by other errors).

In addition, probably since the async_io query eventually was handled by spice server, and the IO_CMD interrupt *was* sent for the expired query, we get a "guest bug" in qemu, for 2 simultaneous async ios (after the driver is reloaded and is back to async io mode).

"qxl-0: guest bug: 2 async started before last (16) complete"

Notice that after a guest_bug is identified by qemu there is another bug (in qemu side):
interface_get_command in qxl.c reports FALSE, while 
interface_req_cmd_notification returns FALSE as well (it reports that the ring is not empty).
This leads to an endless loop in spice-server red_worker.c (see https://bugzilla.redhat.com/show_bug.cgi?id=964136#c29)

I'm not sure about the severity of this bug because we shouldn't reach a query timeout in the first place.
We reached it in bug 964136, probably because by mistake we had too long timeout in spice-server when waiting for a response from a client, and from a reason we don't know yet, the client was not responsive. I'll send a patch to fix the timeout on the server side.

Related Bug: 964136

Comment 1 bin.liu 2013-09-07 07:57:36 UTC
has this patch been submitted? I have encountered this bug on qemu-kvm-0.12 and spice-server-0.12

Comment 2 Yonit Halperin 2013-09-09 15:08:06 UTC
(In reply to bin.liu from comment #1)
> has this patch been submitted? I have encountered this bug on qemu-kvm-0.12
> and spice-server-0.12

A patch for bug 995041 has been submitted. It is part of spice-server-0.12.4-3.el6

Comment 9 Sandro Bonazzola 2015-10-26 12:50:14 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 10 Uri Lublin 2015-10-26 13:15:34 UTC
This bug is not a blocker

Comment 12 Yaniv Lavi 2016-05-09 11:03:40 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.