Bug 1772321

Summary: qcow2 image corruption due to incorrect locking in preallocation detection
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Tingting Mao <timao>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
qemu-kvm sub component: General QA Contact: CongLi <coli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: coli, ddepaula, drjones, hreitz, juzhang, kchamart, knoel, kwolf, leiyang, lhh, virt-maint, yfu
Version: 8.2   
Target Milestone: rc   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1764721 Environment:
Last Closed: 2020-05-05 09:50:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1764721    
Bug Blocks:    

Description Tingting Mao 2019-11-14 06:28:47 UTC
+++ This bug was initially created as a clone of Bug #1764721 +++

Tested with:
qemu-kvm-4.1.0-14.module+el8.2.0+4677+51176c2e
kernel-4.18.0-148.el8


Steps:
1. Boot a guest from a image installed OS before.
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=rhel78-64-virtio-scsi.qcow2,id=tt

2. 'savevm' while writing data to guest
(guest) $ while true ; do dd if=/dev/zero of=ftest bs=1024k count=4000 ; done
(qemu) savevm foo
(qemu) quit

3. Do test loop for it
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=rhel78-64-virtio-scsi.qcow2,id=tt
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) 
(qemu) 
(qemu) 
(qemu) savevm foo
(qemu) 
(qemu) quit 
[root@lenovo-sr630-02 test]# 
[root@lenovo-sr630-02 test]# while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=rhel78-64-virtio-scsi.qcow2,id=tt; done
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
Error: Error while deleting snapshot on device 'tt': Failed to free the cluster and L1 table: Invalid argument
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
Error: Device 'tt' does not have the requested snapshot 'foo'
(qemu) c
(qemu) stop
(qemu) savevm foo
^Cqcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with active L2 table); further corruption events will be suppressed
Error: Error while writing VM state: Input/output error
(qemu) qemu-kvm: terminating on signal 2
qemu-kvm: -drive file=rhel78-64-virtio-scsi.qcow2,id=tt: qcow2: Image is corrupt; cannot be opened read/write
^C

4. Check the image
# qemu-img check rhel78-64-virtio-scsi.qcow2
......
......
ERROR OFLAG_COPIED data cluster: l2_entry=4f3b0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=4f3c0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=4f3d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=4f3e0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=4f3f0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=4f400000 refcount=1

13656 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

117027 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
149762/327680 = 45.70% allocated, 20.27% fragmented, 0.00% compressed clusters



Result:
As above, the image is corrupted.

Comment 2 Kevin Wolf 2019-11-14 08:28:38 UTC
This will be fixed with the rebase on upstream 4.2.

Comment 5 Tingting Mao 2019-11-19 03:37:16 UTC
Tried this bug in latest qemu version as below. The bug has been fixed already.


Tested with:
qemu-kvm-4.2.0-0.module+el8.2.0+4743+23ad88a2
kernel-4.18.0-148.el8.x86_64


Steps:
1.Create image file
# qemu-img create -f qcow2 -o preallocation=falloc base.qcow2 20G

2. Install guest with it
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt -cdrom RHEL7.7-Server-x86_64.iso

3. Shutdown after installation, and check the image file
# qemu-img check base.qcow2 
No errors were found on the image.
327680/327680 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters

4. Boot guest again from the image file
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt

5. ‘Savevm’ while ‘dd’ in the guest
(guest) $ while true ; do dd if=/dev/zero of=ftest bs=1024k count=4000 ; done
(qemu) savevm foo
(qemu) quit

6. Do test loop
# while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done


Result:
For Step6, after 15-time loops, it still works. And check the image after quitting the loop, there is no error in the image. 
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.91 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
.......


# qemu-img check base.qcow2 
No errors were found on the image.
334340/327680 = 102.03% allocated, 2.31% fragmented, 0.00% compressed clusters
Image end offset: 21985427456

Comment 7 Tingting Mao 2019-11-29 02:51:52 UTC
Tried with latest qemu version 'qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb', there is no the issue anymore.


Tested with:
qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb
kernel-4.18.0-153.el8.x86_64


Steps:
As Comment 5.


Result:
Works well after about 18-times loop.


# while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=rhel77-64-virtio-scsi.qcow2,id=tt; done
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo

Comment 8 Ademar Reis 2020-02-05 23:08:18 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 11 errata-xmlrpc 2020-05-05 09:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017