514901 – kvm virtio_blk errors - "end_request: I/O error, dev vda, sector 0"

Bug 514901 - kvm virtio_blk errors - "end_request: I/O error, dev vda, sector 0"

Summary: kvm virtio_blk errors - "end_request: I/O error, dev vda, sector 0"

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F12VirtBlocker 513460
TreeView+	depends on / blocked

Reported:	2009-07-31 11:39 UTC by Jeff Layton
Modified:	2014-06-18 07:39 UTC (History)
CC List:	13 users (show)
Fixed In Version:	kernel-2.6.31-0.138.rc5.git3.fc12
Clone Of:
Environment:
Last Closed:	2009-08-07 07:57:54 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dumpxml of the rawhide guest from virsh (1.26 KB, text/plain) 2009-07-31 11:39 UTC, Jeff Layton	no flags	Details
block-silently-error-unsupported-empty-barriers-too.patch (1.42 KB, patch) 2009-08-06 11:21 UTC, Mark McLoughlin	no flags	Details \| Diff
View All

Description Jeff Layton 2009-07-31 11:39:00 UTC

Created attachment 355795 [details]
dumpxml of the rawhide guest from virsh

Opening as a rawhide kernel bug, but it could also be a problem with the host kernel or qemu instance.

I have a rawhide kvm guest running on a Fedora 10 host. The guest's ring buffer is continually being spammed several times per second with this message:

end_request: I/O error, dev vda, sector 0

...everything seems to be working OK otherwise.

The guest is currently running:

2.6.31-0.94.rc4.fc12.x86_64

...but the problem has existed for quite some time with earlier kernels too. The host is running:

2.6.27.25-170.2.72.fc10.x86_64

...and has:

qemu-0.9.1-12.fc10.x86_64
kvm-74-10.fc10.x86_64

The guest is using the virtio block driver for the disk. Let me know if any other info would be useful.

Comment 1 Mark McLoughlin 2009-08-05 11:45:48 UTC

Since we have a report of this on an F-11 host too, it sounds like a guest kernel bug:

  http://www.redhat.com/archives/fedora-virt/2009-August/msg00000.html

But AFAICS, we should only get this if:

  1) There's is a read/write error on the disk image in the host. Of note
     here is that the disk image is an LVM volume /dev/rootvg64/rawhide

  2) The guest issues a SCSI command

Perhaps it's the serial number support:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1d589bb1

or the SG_IO passthru support:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1cde26f9

Both were added in 2.6.31-rc1

Comment 2 Jerry James 2009-08-05 15:24:57 UTC

Mark asked me to try kernel-2.6.30-1.fc12.x86_64.  With that kernel, I do not see the error message.

Comment 3 Mark McLoughlin 2009-08-05 22:16:55 UTC

okay, the error is coming from here:

bool blk_do_ordered(struct request_queue *q, struct request **rqp)
{
        struct request *rq = *rqp;
        const int is_barrier = blk_fs_request(rq) && blk_barrier_rq(rq);

        if (!q->ordseq) {
                if (!is_barrier)
                        return true;

                if (q->next_ordered != QUEUE_ORDERED_NONE)
                        return start_ordered(q, rqp);
                else {
                        /*                                                      
                         * Queue ordering not supported.  Terminate             
                         * with prejudice.                                      
                         */
                        blk_dequeue_request(rq);
                        __blk_end_request_all(rq, -EOPNOTSUPP);

partial stack trace:

 [<ffffffff81254768>] blk_update_request+0xca/0x363
 [<ffffffff8127bc00>] ? debug_object_deactivate+0x47/0xf2
 [<ffffffff81254a30>] blk_update_bidi_request+0x2f/0x7f
 [<ffffffff812565b2>] __blk_end_request_all+0x44/0x74
 [<ffffffff8125c096>] blk_do_ordered+0x1e0/0x2ae
 [<ffffffff81256756>] blk_peek_request+0x174/0x1c8
 [<ffffffffa00102d3>] do_virtblk_request+0x192/0x1d3 [virtio_blk]
 [<ffffffff8125710f>] __blk_run_queue+0x54/0x9a

Comment 4 Mark McLoughlin 2009-08-06 10:47:50 UTC

Okay, reverting this:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=52b1fd5a27

  commit 52b1fd5a27c625c78373e024bf570af3c9d44a79
  Author: Mikulas Patocka <mpatocka>
  Date:   Mon Jun 22 10:12:21 2009 +0100

    dm: send empty barriers to targets in dm_flush
    
    Pass empty barrier flushes to the targets in dm_flush().
    
    Signed-off-by: Mikulas Patocka <mpatocka>
    Signed-off-by: Alasdair G Kergon <agk>

fixes it for me

Also, this little hack fixes it too:

  @@ -1163,7 +1163,7 @@ static int __make_request(struct request_queue *q, struct 
          const int unplug = bio_unplug(bio);
          int rw_flags;
 
  -       if (bio_barrier(bio) && bio_has_data(bio) &&
  +       if (bio_barrier(bio) && /* bio_has_data(bio) && */
              (q->next_ordered == QUEUE_ORDERED_NONE)) {
                  bio_endio(bio, -EOPNOTSUPP);
                  return 0;

virtio_blk doesn't support barriers, and it seems these empty barriers submitted by device-mapper are getting through to the device and causing these errors

Comment 5 Mark McLoughlin 2009-08-06 11:20:56 UTC

okay, sent patch upstream:

  http://lkml.org/lkml/2009/8/6/153

Comment 6 Mark McLoughlin 2009-08-06 11:21:38 UTC

Created attachment 356487 [details]
block-silently-error-unsupported-empty-barriers-too.patch

Comment 7 Alasdair Kergon 2009-08-06 12:01:07 UTC

Empty barriers are the mechanism now used to request block device flushes.

After the block layer changed, we had no flushing support through dm devices for a long time as the new method was tricky for us to use.  Eventually we worked around the problems and the solution is mostly in place, as you have discovered.

You should think about whether you're limiting the applicability/usefulness of your software if you don't also implement support for empty barriers in virtio_blk.

Comment 8 Jeff Layton 2009-08-06 13:52:16 UTC

Agreed. I think virtio_blk needs to support barriers if it's going to be at all useful. Otherwise, VM's that use it would be subject to data integrity problems in the face of crashes, right?

Comment 9 Mark McLoughlin 2009-08-06 15:43:46 UTC

can we continue the discussion on lkml about whether barrier support should be required?

response from hch was:

  virtio_blk on kvm does not support any barriers at all, similar to many
  other drivers out there.  If the queue flags say we don't support
  barriers higher layers should not submit any barrier requests.

wrt. immediate F-12 concerns, we should focus on getting rid of these errors

Comment 10 Justin M. Forbes 2009-08-06 19:38:35 UTC

The patch has been applied to the F12 kernel to address the immediate concerns.

Comment 11 Mikuláš Patočka 2009-08-07 01:35:11 UTC

I think the patch makes sense.

Comment 12 Mark McLoughlin 2009-08-07 07:57:54 UTC

Okay, patch applied in rawhide:

* Thu Aug 06 2009 Justin M. Forbes <jforbes> 2.6.31-0.138.rc5.git3
- Fix kvm virtio_blk errors (#514901)

Not yet tagged for f12

Note You need to log in before you can comment on or make changes to this bug.