Bug 994211 - Spice: fallback from async io mode to sync mode might result in errors and hanging vm
Spice: fallback from async io mode to sync mode might result in errors and ha...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: spice-qxl-driver-win (Show other bugs)
3.2.0
Unspecified Unspecified
unspecified Severity low
: ovirt-4.0.0-rc3
: 4.0.0
Assigned To: Uri Lublin
SPICE QE bug list
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-06 14:23 EDT by Yonit Halperin
Modified: 2016-06-28 10:17 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-28 10:17:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Spice
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yonit Halperin 2013-08-06 14:23:29 EDT
Description of problem:

If an async io query doesn't get a reply after 60 seconds, the driver transfers from async_io mode to sync_io.

However, not all the io operations have a sync support, so we can get errors like 

qxl/guest-0: 222140443011: qxldd: set custom display FFFFF900C1E00020
qxl/guest-0: 222147232862: qxldd: DrvAssertMode: 0xc1e00020 revision 4 enable 0
qxl/guest-0: 222147251233: qxldd: ERROR: trying calling sync io on NULL port 6

for ASYNCABLE_FLUSH_SURFACES (and not flushing the surfaces can be followed by other errors).

In addition, probably since the async_io query eventually was handled by spice server, and the IO_CMD interrupt *was* sent for the expired query, we get a "guest bug" in qemu, for 2 simultaneous async ios (after the driver is reloaded and is back to async io mode).

"qxl-0: guest bug: 2 async started before last (16) complete"

Notice that after a guest_bug is identified by qemu there is another bug (in qemu side):
interface_get_command in qxl.c reports FALSE, while 
interface_req_cmd_notification returns FALSE as well (it reports that the ring is not empty).
This leads to an endless loop in spice-server red_worker.c (see https://bugzilla.redhat.com/show_bug.cgi?id=964136#c29)

I'm not sure about the severity of this bug because we shouldn't reach a query timeout in the first place.
We reached it in bug 964136, probably because by mistake we had too long timeout in spice-server when waiting for a response from a client, and from a reason we don't know yet, the client was not responsive. I'll send a patch to fix the timeout on the server side.

Related Bug: 964136
Comment 1 bin.liu 2013-09-07 03:57:36 EDT
has this patch been submitted? I have encountered this bug on qemu-kvm-0.12 and spice-server-0.12
Comment 2 Yonit Halperin 2013-09-09 11:08:06 EDT
(In reply to bin.liu from comment #1)
> has this patch been submitted? I have encountered this bug on qemu-kvm-0.12
> and spice-server-0.12

A patch for bug 995041 has been submitted. It is part of spice-server-0.12.4-3.el6
Comment 9 Sandro Bonazzola 2015-10-26 08:50:14 EDT
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high
Comment 10 Uri Lublin 2015-10-26 09:15:34 EDT
This bug is not a blocker
Comment 12 Yaniv Lavi (Dary) 2016-05-09 07:03:40 EDT
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Note You need to log in before you can comment on or make changes to this bug.