Created attachment 355795 [details] dumpxml of the rawhide guest from virsh Opening as a rawhide kernel bug, but it could also be a problem with the host kernel or qemu instance. I have a rawhide kvm guest running on a Fedora 10 host. The guest's ring buffer is continually being spammed several times per second with this message: end_request: I/O error, dev vda, sector 0 ...everything seems to be working OK otherwise. The guest is currently running: 2.6.31-0.94.rc4.fc12.x86_64 ...but the problem has existed for quite some time with earlier kernels too. The host is running: 2.6.27.25-170.2.72.fc10.x86_64 ...and has: qemu-0.9.1-12.fc10.x86_64 kvm-74-10.fc10.x86_64 The guest is using the virtio block driver for the disk. Let me know if any other info would be useful.
Since we have a report of this on an F-11 host too, it sounds like a guest kernel bug: http://www.redhat.com/archives/fedora-virt/2009-August/msg00000.html But AFAICS, we should only get this if: 1) There's is a read/write error on the disk image in the host. Of note here is that the disk image is an LVM volume /dev/rootvg64/rawhide 2) The guest issues a SCSI command Perhaps it's the serial number support: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1d589bb1 or the SG_IO passthru support: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1cde26f9 Both were added in 2.6.31-rc1
Mark asked me to try kernel-2.6.30-1.fc12.x86_64. With that kernel, I do not see the error message.
okay, the error is coming from here: bool blk_do_ordered(struct request_queue *q, struct request **rqp) { struct request *rq = *rqp; const int is_barrier = blk_fs_request(rq) && blk_barrier_rq(rq); if (!q->ordseq) { if (!is_barrier) return true; if (q->next_ordered != QUEUE_ORDERED_NONE) return start_ordered(q, rqp); else { /* * Queue ordering not supported. Terminate * with prejudice. */ blk_dequeue_request(rq); __blk_end_request_all(rq, -EOPNOTSUPP); partial stack trace: [<ffffffff81254768>] blk_update_request+0xca/0x363 [<ffffffff8127bc00>] ? debug_object_deactivate+0x47/0xf2 [<ffffffff81254a30>] blk_update_bidi_request+0x2f/0x7f [<ffffffff812565b2>] __blk_end_request_all+0x44/0x74 [<ffffffff8125c096>] blk_do_ordered+0x1e0/0x2ae [<ffffffff81256756>] blk_peek_request+0x174/0x1c8 [<ffffffffa00102d3>] do_virtblk_request+0x192/0x1d3 [virtio_blk] [<ffffffff8125710f>] __blk_run_queue+0x54/0x9a
Okay, reverting this: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=52b1fd5a27 commit 52b1fd5a27c625c78373e024bf570af3c9d44a79 Author: Mikulas Patocka <mpatocka> Date: Mon Jun 22 10:12:21 2009 +0100 dm: send empty barriers to targets in dm_flush Pass empty barrier flushes to the targets in dm_flush(). Signed-off-by: Mikulas Patocka <mpatocka> Signed-off-by: Alasdair G Kergon <agk> fixes it for me Also, this little hack fixes it too: @@ -1163,7 +1163,7 @@ static int __make_request(struct request_queue *q, struct const int unplug = bio_unplug(bio); int rw_flags; - if (bio_barrier(bio) && bio_has_data(bio) && + if (bio_barrier(bio) && /* bio_has_data(bio) && */ (q->next_ordered == QUEUE_ORDERED_NONE)) { bio_endio(bio, -EOPNOTSUPP); return 0; virtio_blk doesn't support barriers, and it seems these empty barriers submitted by device-mapper are getting through to the device and causing these errors
okay, sent patch upstream: http://lkml.org/lkml/2009/8/6/153
Created attachment 356487 [details] block-silently-error-unsupported-empty-barriers-too.patch
Empty barriers are the mechanism now used to request block device flushes. After the block layer changed, we had no flushing support through dm devices for a long time as the new method was tricky for us to use. Eventually we worked around the problems and the solution is mostly in place, as you have discovered. You should think about whether you're limiting the applicability/usefulness of your software if you don't also implement support for empty barriers in virtio_blk.
Agreed. I think virtio_blk needs to support barriers if it's going to be at all useful. Otherwise, VM's that use it would be subject to data integrity problems in the face of crashes, right?
can we continue the discussion on lkml about whether barrier support should be required? response from hch was: virtio_blk on kvm does not support any barriers at all, similar to many other drivers out there. If the queue flags say we don't support barriers higher layers should not submit any barrier requests. wrt. immediate F-12 concerns, we should focus on getting rid of these errors
The patch has been applied to the F12 kernel to address the immediate concerns.
I think the patch makes sense.
Okay, patch applied in rawhide: * Thu Aug 06 2009 Justin M. Forbes <jforbes> 2.6.31-0.138.rc5.git3 - Fix kvm virtio_blk errors (#514901) Not yet tagged for f12