612089 – virtio-blk: Possible loss of flushes after werror/rerror stop

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 612089 - virtio-blk: Possible loss of flushes after werror/rerror stop

Summary: virtio-blk: Possible loss of flushes after werror/rerror stop

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Kevin Wolf
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-07 10:01 UTC by Kevin Wolf
Modified:	2013-01-09 22:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-07-16 08:57:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kevin Wolf 2010-07-07 10:01:18 UTC

Quoting an email thread:

> > 2. If you have several ENOSPACE events coming from several VCPU at once,
> >      will you be able to keep the ordering once the VM is resumed
> >     (assuming one of them is a barrier)
>
> I don't think this is related to VCPUs, but rather to the host block
> layer sending responses. The flush will probably be dropped in this
> scenario.

Dor suggested inserting a flush after restarting any requests just in case. Sounds like it's something that's possible to do and could help a bit, even though it's not entirely correct behaviour.

Comment 2 chellwig@redhat.com 2010-07-16 05:50:20 UTC

Analysis:

If we stop the VM on an error in virtio_blk_handle_rw_error we put the request we stop on in front
of the s->rq queue, which we then serialize in virtio_blk_save.

Once the VM is restarted we de-serialize them again in virtio_blk_load, and finally resume them in virtio_blk_dma_restart_bh.  We do this for all requests, including the flush requests.

So we defintively do handle flushes on VM abort / restart.  Now the questions is if we handle the requests (flush or not) in the right order.

 - in virtio_blk_handle_rw_error we add the newly failing request to the front of the queue,
   effectively reversing the order of the requests.
 - in virtio_blk_save we serialize the requests in the order found in the queue
 - in virtio_blk_load each request deserialized is added to the front of the s->rq queue, effectively reversing the order in which is was serialized
 - virtio_blk_dma_restart_bh then handles the requests in order.

So I think we are fine, but the way \ the s->rq queue is manipulated smells like this happend more by accident than by intention.

Comment 3 Kevin Wolf 2010-07-16 07:47:46 UTC

(In reply to comment #2)
> Analysis:
> 
> If we stop the VM on an error in virtio_blk_handle_rw_error we put the request
> we stop on in front
> of the s->rq queue, which we then serialize in virtio_blk_save.
> 
> Once the VM is restarted we de-serialize them again in virtio_blk_load, and
> finally resume them in virtio_blk_dma_restart_bh.  We do this for all requests,
> including the flush requests.
> 
> So we defintively do handle flushes on VM abort / restart.

Well, yes and no. If a flush fails, we put it into the queue, that's right. But that's not the whole truth, the flush may succeed and a prior write request may fail. Which should not be a problem as long as we signal the failure back to the guest, but when we restart the write request outselves the write succeeds from the guest POV, but it has moved across the barrier. So I think we need to flush once again after the restarted write request.

I think this is actually a pretty common case with RHEV: Writes won't succeed because of ENOSPC, but a flush doesn't need more space and still works.

Comment 4 chellwig@redhat.com 2010-07-16 08:34:36 UTC

(In reply to comment #3)
> Well, yes and no. If a flush fails, we put it into the queue, that's right. But
> that's not the whole truth, the flush may succeed and a prior write request may
> fail. Which should not be a problem as long as we signal the failure back to
> the guest, but when we restart the write request outselves the write succeeds
> from the guest POV, but it has moved across the barrier. So I think we need to
> flush once again after the restarted write request.

I/O ordering is a host problem.  Linux guests will never start a flush request before the queue
hasn't been drained to avoid dealing with the complex problem of out of order completions
for cache flushes.  While I can't verify this for other guests I can't see any other way to deal
with the typical error scenarios.

Comment 5 Kevin Wolf 2010-07-16 08:57:28 UTC

As Christoph explained in IRC, the guest not only does, but it even has to drain the queue before flushes to get the right semantics even during normal operation. Therefore this is not a problem, closing.

Note You need to log in before you can comment on or make changes to this bug.