This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 514901 - kvm virtio_blk errors - "end_request: I/O error, dev vda, sector 0"
kvm virtio_blk errors - "end_request: I/O error, dev vda, sector 0"
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
high Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: 513460 F12VirtBlocker
  Show dependency treegraph
 
Reported: 2009-07-31 07:39 EDT by Jeff Layton
Modified: 2014-06-18 03:39 EDT (History)
13 users (show)

See Also:
Fixed In Version: kernel-2.6.31-0.138.rc5.git3.fc12
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-08-07 03:57:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
dumpxml of the rawhide guest from virsh (1.26 KB, text/plain)
2009-07-31 07:39 EDT, Jeff Layton
no flags Details
block-silently-error-unsupported-empty-barriers-too.patch (1.42 KB, patch)
2009-08-06 07:21 EDT, Mark McLoughlin
no flags Details | Diff

  None (edit)
Description Jeff Layton 2009-07-31 07:39:00 EDT
Created attachment 355795 [details]
dumpxml of the rawhide guest from virsh

Opening as a rawhide kernel bug, but it could also be a problem with the host kernel or qemu instance.

I have a rawhide kvm guest running on a Fedora 10 host. The guest's ring buffer is continually being spammed several times per second with this message:

end_request: I/O error, dev vda, sector 0

...everything seems to be working OK otherwise.

The guest is currently running:

2.6.31-0.94.rc4.fc12.x86_64

...but the problem has existed for quite some time with earlier kernels too. The host is running:

2.6.27.25-170.2.72.fc10.x86_64

...and has:

qemu-0.9.1-12.fc10.x86_64
kvm-74-10.fc10.x86_64

The guest is using the virtio block driver for the disk. Let me know if any other info would be useful.
Comment 1 Mark McLoughlin 2009-08-05 07:45:48 EDT
Since we have a report of this on an F-11 host too, it sounds like a guest kernel bug:

  http://www.redhat.com/archives/fedora-virt/2009-August/msg00000.html

But AFAICS, we should only get this if:

  1) There's is a read/write error on the disk image in the host. Of note
     here is that the disk image is an LVM volume /dev/rootvg64/rawhide

  2) The guest issues a SCSI command

Perhaps it's the serial number support:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1d589bb1

or the SG_IO passthru support:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1cde26f9

Both were added in 2.6.31-rc1
Comment 2 Jerry James 2009-08-05 11:24:57 EDT
Mark asked me to try kernel-2.6.30-1.fc12.x86_64.  With that kernel, I do not see the error message.
Comment 3 Mark McLoughlin 2009-08-05 18:16:55 EDT
okay, the error is coming from here:

bool blk_do_ordered(struct request_queue *q, struct request **rqp)
{
        struct request *rq = *rqp;
        const int is_barrier = blk_fs_request(rq) && blk_barrier_rq(rq);

        if (!q->ordseq) {
                if (!is_barrier)
                        return true;

                if (q->next_ordered != QUEUE_ORDERED_NONE)
                        return start_ordered(q, rqp);
                else {
                        /*                                                      
                         * Queue ordering not supported.  Terminate             
                         * with prejudice.                                      
                         */
                        blk_dequeue_request(rq);
                        __blk_end_request_all(rq, -EOPNOTSUPP);

partial stack trace:

 [<ffffffff81254768>] blk_update_request+0xca/0x363
 [<ffffffff8127bc00>] ? debug_object_deactivate+0x47/0xf2
 [<ffffffff81254a30>] blk_update_bidi_request+0x2f/0x7f
 [<ffffffff812565b2>] __blk_end_request_all+0x44/0x74
 [<ffffffff8125c096>] blk_do_ordered+0x1e0/0x2ae
 [<ffffffff81256756>] blk_peek_request+0x174/0x1c8
 [<ffffffffa00102d3>] do_virtblk_request+0x192/0x1d3 [virtio_blk]
 [<ffffffff8125710f>] __blk_run_queue+0x54/0x9a
Comment 4 Mark McLoughlin 2009-08-06 06:47:50 EDT
Okay, reverting this:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=52b1fd5a27

  commit 52b1fd5a27c625c78373e024bf570af3c9d44a79
  Author: Mikulas Patocka <mpatocka@redhat.com>
  Date:   Mon Jun 22 10:12:21 2009 +0100

    dm: send empty barriers to targets in dm_flush
    
    Pass empty barrier flushes to the targets in dm_flush().
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Signed-off-by: Alasdair G Kergon <agk@redhat.com>

fixes it for me

Also, this little hack fixes it too:

  @@ -1163,7 +1163,7 @@ static int __make_request(struct request_queue *q, struct 
          const int unplug = bio_unplug(bio);
          int rw_flags;
 
  -       if (bio_barrier(bio) && bio_has_data(bio) &&
  +       if (bio_barrier(bio) && /* bio_has_data(bio) && */
              (q->next_ordered == QUEUE_ORDERED_NONE)) {
                  bio_endio(bio, -EOPNOTSUPP);
                  return 0;

virtio_blk doesn't support barriers, and it seems these empty barriers submitted by device-mapper are getting through to the device and causing these errors
Comment 5 Mark McLoughlin 2009-08-06 07:20:56 EDT
okay, sent patch upstream:

  http://lkml.org/lkml/2009/8/6/153
Comment 6 Mark McLoughlin 2009-08-06 07:21:38 EDT
Created attachment 356487 [details]
block-silently-error-unsupported-empty-barriers-too.patch
Comment 7 Alasdair Kergon 2009-08-06 08:01:07 EDT
Empty barriers are the mechanism now used to request block device flushes.

After the block layer changed, we had no flushing support through dm devices for a long time as the new method was tricky for us to use.  Eventually we worked around the problems and the solution is mostly in place, as you have discovered.

You should think about whether you're limiting the applicability/usefulness of your software if you don't also implement support for empty barriers in virtio_blk.
Comment 8 Jeff Layton 2009-08-06 09:52:16 EDT
Agreed. I think virtio_blk needs to support barriers if it's going to be at all useful. Otherwise, VM's that use it would be subject to data integrity problems in the face of crashes, right?
Comment 9 Mark McLoughlin 2009-08-06 11:43:46 EDT
can we continue the discussion on lkml about whether barrier support should be required?

response from hch was:

  virtio_blk on kvm does not support any barriers at all, similar to many
  other drivers out there.  If the queue flags say we don't support
  barriers higher layers should not submit any barrier requests.

wrt. immediate F-12 concerns, we should focus on getting rid of these errors
Comment 10 Justin M. Forbes 2009-08-06 15:38:35 EDT
The patch has been applied to the F12 kernel to address the immediate concerns.
Comment 11 Mikulas Patocka 2009-08-06 21:35:11 EDT
I think the patch makes sense.
Comment 12 Mark McLoughlin 2009-08-07 03:57:54 EDT
Okay, patch applied in rawhide:

* Thu Aug 06 2009 Justin M. Forbes <jforbes@redhat.com> 2.6.31-0.138.rc5.git3
- Fix kvm virtio_blk errors (#514901)

Not yet tagged for f12

Note You need to log in before you can comment on or make changes to this bug.