994211 – Spice: fallback from async io mode to sync mode might result in errors and hanging vm

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 994211 - Spice: fallback from async io mode to sync mode might result in errors and hanging vm

Summary: Spice: fallback from async io mode to sync mode might result in errors and ha...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	spice-qxl-xddm
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Uri Lublin
QA Contact:	SPICE QE bug list
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-06 18:23 UTC by Yonit Halperin
Modified:	2019-10-10 13:51 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-28 14:17:20 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yonit Halperin 2013-08-06 18:23:29 UTC

Description of problem:

If an async io query doesn't get a reply after 60 seconds, the driver transfers from async_io mode to sync_io.

However, not all the io operations have a sync support, so we can get errors like 

qxl/guest-0: 222140443011: qxldd: set custom display FFFFF900C1E00020
qxl/guest-0: 222147232862: qxldd: DrvAssertMode: 0xc1e00020 revision 4 enable 0
qxl/guest-0: 222147251233: qxldd: ERROR: trying calling sync io on NULL port 6

for ASYNCABLE_FLUSH_SURFACES (and not flushing the surfaces can be followed by other errors).

In addition, probably since the async_io query eventually was handled by spice server, and the IO_CMD interrupt *was* sent for the expired query, we get a "guest bug" in qemu, for 2 simultaneous async ios (after the driver is reloaded and is back to async io mode).

"qxl-0: guest bug: 2 async started before last (16) complete"

Notice that after a guest_bug is identified by qemu there is another bug (in qemu side):
interface_get_command in qxl.c reports FALSE, while 
interface_req_cmd_notification returns FALSE as well (it reports that the ring is not empty).
This leads to an endless loop in spice-server red_worker.c (see https://bugzilla.redhat.com/show_bug.cgi?id=964136#c29)

I'm not sure about the severity of this bug because we shouldn't reach a query timeout in the first place.
We reached it in bug 964136, probably because by mistake we had too long timeout in spice-server when waiting for a response from a client, and from a reason we don't know yet, the client was not responsive. I'll send a patch to fix the timeout on the server side.

Related Bug: 964136

Comment 1 bin.liu 2013-09-07 07:57:36 UTC

has this patch been submitted? I have encountered this bug on qemu-kvm-0.12 and spice-server-0.12

Comment 2 Yonit Halperin 2013-09-09 15:08:06 UTC

(In reply to bin.liu from comment #1)
> has this patch been submitted? I have encountered this bug on qemu-kvm-0.12
> and spice-server-0.12

A patch for bug 995041 has been submitted. It is part of spice-server-0.12.4-3.el6

Comment 9 Sandro Bonazzola 2015-10-26 12:50:14 UTC

this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 10 Uri Lublin 2015-10-26 13:15:34 UTC

This bug is not a blocker

Comment 12 Yaniv Lavi 2016-05-09 11:03:40 UTC

oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Note You need to log in before you can comment on or make changes to this bug.