Bug 688146

Summary:	qcow2: Some paths fail to handle I/O errors
Product:	Red Hat Enterprise Linux 6	Reporter:	Kevin Wolf <kwolf>
Component:	qemu-kvm	Assignee:	Kevin Wolf <kwolf>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	ehabkost, juzhang, mjenner, mkenneth, tburke, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-0.12.1.2-2.151.el6	Doc Type:	Bug Fix
Doc Text:	Cause: bugs in the handling of errors in the qcow2 code. Consequence: some error cases were being ignored, and could cause image corruption. Fix: backport of error handling fixes on qcow2 code. Result: safer error handling and qcow2 image corruption avoided on error cases.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-05-19 11:21:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kevin Wolf 2011-03-16 13:18:29 UTC

Upstream has some fixes for error handling that need to be backported:

- Immediate I/O error for reading from the backing file were ignored
- I/O errors in reading compressed clusters were ignored
- COW of L2 tables with internal snapshots used an unsafe order so that I/O errors or crashes in the middle could cause image corruption

Comment 6 juzhang 2011-03-21 05:43:40 UTC

(In reply to comment #0)
> Upstream has some fixes for error handling that need to be backported:
> 
> - Immediate I/O error for reading from the backing file were ignored
I both tried fixed version(qemu-kvm-0.12.1.2-2.151.el6) and unfixed version(qemu-kvm-0.12.1.2-2.149.el6),both failed.seems immediate error still be ignored.any mistake,please fix me.
1.wrote blkdebug configuration file
cat > blkdebug.cfg <<EOF
[inject-error]
event = ""
errno = "5"
immediately = "off"
EOF
2.create qcow2 
qemu-img create -f qcow2 test.qcow2 2G
3.read
qemu-io blkdebug:blkdebug.cfg:test.qcow2
qemu-io> read 0 1G
read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0.0000 sec (6.575 GiB/sec and 6.5751 ops/sec)

> - I/O errors in reading compressed clusters were ignored
Can't reproduce this issue.would you please provide effectively methods?
1.create qcow2 img.
qemu-img  create  -f "qcow2" zhang.qcow2 6G

2.covert to compressed img
qemu-img convert -f qcow2 zhang.qcow2 -O qcow2 -c zhangconvert1.qcow2

3.compressed image on an NFS mount

4.boot guest with compressed img as second img and rerror=stop.
-drive file=/root/nfs/zhangconvert1.qcow2,if=none,id=test1,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test1

5.In guest,keep reading form compressed img
while true;do dd if=/dev/vdb of=/dev/null;done
6.disconned  nfs server.

Results:
vm still running.

> - COW of L2 tables with internal snapshots used an unsafe order so that I/O
> errors or crashes in the middle could cause image corruption
I still can not find reproduce methods,would you please provide effectively methods?

Comment 7 Kevin Wolf 2011-04-15 09:26:05 UTC

(In reply to comment #6)
> (In reply to comment #0)
> > Upstream has some fixes for error handling that need to be backported:
> > 
> > - Immediate I/O error for reading from the backing file were ignored
> I both tried fixed version(qemu-kvm-0.12.1.2-2.151.el6) and unfixed
> version(qemu-kvm-0.12.1.2-2.149.el6),both failed.seems immediate error still be
> ignored.any mistake,please fix me.
> 1.wrote blkdebug configuration file
> cat > blkdebug.cfg <<EOF
> [inject-error]
> event = ""
> errno = "5"
> immediately = "off"
> EOF

A rule with an empty event name is never triggered. You may use event = "aio_read".

Also, please note that this is about failed reads from the backing file. So you need a backing file and a overlay, like this:

qemu-img create -f qcow2 base.qcow2 2G
qemu-img create -f qcow2 -b blkdebug:blkdebug.cfg:base.qcow2 snap1.qcow2

And then try to read from snap1.qcow2 (without having written to it before, so that it tries to access the backing file).

> > - I/O errors in reading compressed clusters were ignored
> Can't reproduce this issue.would you please provide effectively methods?
> [...] 
> Results:
> vm still running.

Expected result with rerror=stop is that the VM stops, so in fact you have reproduced the bug.

> > - COW of L2 tables with internal snapshots used an unsafe order so that I/O
> > errors or crashes in the middle could cause image corruption
> I still can not find reproduce methods,would you please provide effectively
> methods?

Use blkdebug to let it fail for the event "l2_alloc.cow_read". Create an internal snapshot and try to write to it. After the I/O has failed, use qemu-img check.

Comment 8 juzhang 2011-04-19 08:15:05 UTC

After communicated with kwolf,we think did functional testing can cover this issue.

We ran two functional testing,didn't find any new bugs and any regression bugs.

https://tcms.engineering.redhat.com/run/19552/
https://tcms.engineering.redhat.com/run/19551/




I also check these packages has applied to qemu-kvm-0.12.1.2-2.158.el6.x86_64.

 #rpm -qa --changelog qemu-kvm | grep 688146 
- kvm-qcow2-Fix-error-handling-for-immediate-backing-file-.patch [bz#688146]
- kvm-qcow2-Fix-error-handling-for-reading-compressed-clus.patch [bz#688146]
- kvm-qcow2-Fix-order-in-L2-table-COW.patch [bz#688146]

Comment 9 juzhang 2011-04-19 08:15:46 UTC

According to comment8,set this issue as verified.

Comment 10 Eduardo Habkost 2011-05-03 19:14:27 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: bugs in the handling of errors in the qcow2 code.

Consquence: some error cases were being ignored, and could cause image corruption.

Fix: backport of error handling fixes on qcow2 code.

Result: safer error handling and qcow2 image corruption avoided on error cases.

Comment 11 Eduardo Habkost 2011-05-03 20:56:23 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,6 +1,6 @@
 Cause: bugs in the handling of errors in the qcow2 code.
 
-Consquence: some error cases were being ignored, and could cause image corruption.
+Consequence: some error cases were being ignored, and could cause image corruption.
 
 Fix: backport of error handling fixes on qcow2 code.

Comment 12 errata-xmlrpc 2011-05-19 11:21:17 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 13 errata-xmlrpc 2011-05-19 13:02:19 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html