Bug 1764721
| Summary: | qcow2 image corruption due to incorrect locking in preallocation detection | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Kevin Wolf <kwolf> | |
| Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> | |
| qemu-kvm sub component: | General | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | unspecified | |||
| Priority: | unspecified | CC: | coli, ddepaula, hreitz, juzhang, kchamart, knoel, leiyang, lhh, virt-maint, ymankad | |
| Version: | 8.1 | Flags: | knoel:
mirror+
|
|
| Target Milestone: | rc | |||
| Target Release: | 8.1 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1772321 (view as bug list) | Environment: | ||
| Last Closed: | 2019-11-06 12:58:42 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1745393, 1772321 | |||
Failed to reproduce this bug again with below CML:
Tested with:
qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61
Steps:
1.Create qcow2 base file
# qemu-img create -f qcow2 base.qcow2 20G
2. Install guest with the file
# /usr/libexec/qemu-kvm \
-name 'guest-rhel8.0' \
-machine q35 \
-nodefaults \
-vga qxl \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
-drive id=drive_image1,if=none,snapshot=off,format=qcow2,file=base.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,serial=SYSTEM_DISK0 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=RHEL7.7-Server-x86_64.iso \
-device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
-vnc :0 \
-monitor stdio \
-m 4096 \
-smp 8 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-net-pci,mac=9a:33:6b:72:e4:b7,id=id84cDQ3,netdev=idjiXt3m,bus=pcie.0-root-port-4,addr=0x0 \
-netdev tap,id=idjiXt3m \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
3. After installation, shutdown the guest and check the image then.
# qemu-img check base.qcow2
No errors were found on the image.
25508/327680 = 7.78% allocated, 16.32% fragmented, 0.00% compressed clusters
Image end offset: 1672675328
# qemu-img info base.qcow2
image: base.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 1.56 GiB
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
4. Boot again from the base image(The command line is the same as the one in step2), and ‘dd’ in the guest.
(guest)while true ; do dd if=/dev/zero of=test bs=1024k count=4000 ; done
5. Take internal snapshot while ‘dd’ is executed in guest
(qemu) savevm foo
(qemu) quit
6. Run loop
# while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest-rhel8.0' -machine q35 -nodefaults -vga qxl -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 -drive id=drive_image1,if=none,snapshot=off,format=qcow2,file=base.qcow2 -device scsi-hd,id=image1,drive=drive_image1,serial=SYSTEM_DISK0 -vnc :0 -monitor stdio -m 4096 -smp 8; done
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
……
Result: Did not hit the error info even after 20 times loop.
Does preallocating the test image help? It seems to be much better reproducible if I create the test image with: $ qemu-img create -f qcow2 -o preallocation=falloc qtest.qcow2 20G Max Reproduced successfully with preallocation=falloc when creatation. Thanks a lot Max.
Tested with:
qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61
1.Create image file
# qemu-img create -f qcow2 -o preallocation=falloc base.qcow2 20G
2. Install guest with it
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt -cdrom RHEL7.7-Server-x86_64.iso
3. Shutdown after installation, and check the image file
# qemu-img check base.qcow2
No errors were found on the image.
327680/327680 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 21478375424
# qemu-img info base.qcow2
image: base.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 24 GiB
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
4. Boot guest again from the image file
# /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt
5. ‘Savevm’ while ‘dd’ in the guest
(guest) $ while true ; do dd if=/dev/zero of=ftest bs=1024k count=4000 ; done
(qemu) savevm foo
(qemu) quit
6. Do test loop
# while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
Error: Error while deleting snapshot on device 'tt': Failed to free the cluster and L1 table: Invalid argument ----------------------------------- Hit error here!
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
Error: Device 'tt' does not have the requested snapshot 'foo'
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) stop
(qemu) savevm foo
(qemu) quit
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) loadvm foo
(qemu) c
(qemu) q
q
^Cqemu-kvm: terminating on signal 2
Result:
As above, hit the error info. Check the image after qemu quit, the image is corrupted.
# qemu-img check base.qcow2
…...
…...
Leaked cluster 361093 refcount=3 reference=2
Leaked cluster 361094 refcount=3 reference=2
Leaked cluster 361095 refcount=3 reference=2
Leaked cluster 361096 refcount=3 reference=2
Leaked cluster 367004 refcount=1 reference=0
Leaked cluster 367005 refcount=1 reference=0
Leaked cluster 367006 refcount=1 reference=0
Leaked cluster 367007 refcount=1 reference=0
Leaked cluster 367008 refcount=1 reference=0
Leaked cluster 367009 refcount=1 reference=0
Leaked cluster 367010 refcount=1 reference=0
Leaked cluster 367011 refcount=1 reference=0
Leaked cluster 367018 refcount=1 reference=0
95 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
229436 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
326148/327680 = 99.53% allocated, 0.27% fragmented, 0.00% compressed clusters
Image end offset: 24072421376
Tried to test this bug as below: Tested with: qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4 kernel-4.18.0-141.el8 Steps are the same as the ones in Comment 15. Result- Did not hit the issue after 20 times loop # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit (In reply to Tingting Mao from comment #17) > Tried to test this bug as below: > > Tested with: > qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4 > kernel-4.18.0-141.el8 > > > Steps are the same as the ones in Comment 15. > > > Result- Did not hit the issue after 20 times loop > > # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo > savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 > -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit And after quit the loop, there is no corruption in the image. ... ... EMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) ^Vstop (qemu) savevm foo ^C^C^Cq ^V^C(qemu) qemu-kvm: terminating on signal 2 QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo ^C(qemu) qemu-kvm: terminating on signal 2 # qemu-img check base.qcow2 No errors were found on the image. 334488/327680 = 102.08% allocated, 3.26% fragmented, 0.00% compressed clusters Image end offset: 21991522304 Based on Comment 17, Comment 18 and Comment 19, set this bug as verified. Thanks all. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3732 |
Upstream commit 69f47505e ('block: avoid recursive block_status call if possible'), which was first contained in QEMU 4.1, introduces an image corruption bug in the qcow2 driver that can be triggered in the common I/O path. This was first reported for upstream in Launchpad: https://bugs.launchpad.net/qemu/+bug/1846427 The easiest way to reproduce this consistently is by using internal snapshots. The bug isn't restricted to those cases, though. The following steps were used in the upstream report to reproduce the bug. Note that the image size seems to be important, so be sure to use the same. 1. Create a new 20 GB test image: $ qemu-img create -f qcow2 qtest.qcow2 20G 2. Install the guest OS. The reporter used a minimal Debian 10 installation. I suppose RHEL would work as well, but I didn't test that. A simple command line is used: $ qemu-kvm -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -cdrom Downloads/mini.iso 3. Inside the guest, start a dd loop to generate some I/O: (guest) $ while true ; do dd if=/dev/zero of=t bs=1024k count=4000 ; done 4. Take an internal snapshot named "foo" and exit QEMU: (qemu) savevm foo (qemu) quit 5. Run a loop that just loads the snapshot and takes a new snapshot every few seconds: $ $ while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | qemu-kvm -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -display none -S ; done After a few iterations, you should see signs of an image corruption. For example, you may get an error like "Error: Error while deleting snapshot on device 'd': Failed to free the cluster and L1 table: Invalid argument". At this point, stop the loop and run 'qemu-img check', which will report image corruption. Patches are posted upstream (though a v2 series will be needed because the added assertion is too strict): https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01414.html