Upstream commit 69f47505e ('block: avoid recursive block_status call if possible'), which was first contained in QEMU 4.1, introduces an image corruption bug in the qcow2 driver that can be triggered in the common I/O path. This was first reported for upstream in Launchpad: https://bugs.launchpad.net/qemu/+bug/1846427 The easiest way to reproduce this consistently is by using internal snapshots. The bug isn't restricted to those cases, though. The following steps were used in the upstream report to reproduce the bug. Note that the image size seems to be important, so be sure to use the same. 1. Create a new 20 GB test image: $ qemu-img create -f qcow2 qtest.qcow2 20G 2. Install the guest OS. The reporter used a minimal Debian 10 installation. I suppose RHEL would work as well, but I didn't test that. A simple command line is used: $ qemu-kvm -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -cdrom Downloads/mini.iso 3. Inside the guest, start a dd loop to generate some I/O: (guest) $ while true ; do dd if=/dev/zero of=t bs=1024k count=4000 ; done 4. Take an internal snapshot named "foo" and exit QEMU: (qemu) savevm foo (qemu) quit 5. Run a loop that just loads the snapshot and takes a new snapshot every few seconds: $ $ while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | qemu-kvm -machine pc-q35-3.1,accel=kvm -m 4096 -chardev stdio,id=charmonitor -mon chardev=charmonitor -drive file=qtest.qcow2,id=d -display none -S ; done After a few iterations, you should see signs of an image corruption. For example, you may get an error like "Error: Error while deleting snapshot on device 'd': Failed to free the cluster and L1 table: Invalid argument". At this point, stop the loop and run 'qemu-img check', which will report image corruption. Patches are posted upstream (though a v2 series will be needed because the added assertion is too strict): https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01414.html
Failed to reproduce this bug again with below CML: Tested with: qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61 Steps: 1.Create qcow2 base file # qemu-img create -f qcow2 base.qcow2 20G 2. Install guest with the file # /usr/libexec/qemu-kvm \ -name 'guest-rhel8.0' \ -machine q35 \ -nodefaults \ -vga qxl \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \ -drive id=drive_image1,if=none,snapshot=off,format=qcow2,file=base.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,serial=SYSTEM_DISK0 \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=RHEL7.7-Server-x86_64.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \ -vnc :0 \ -monitor stdio \ -m 4096 \ -smp 8 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:33:6b:72:e4:b7,id=id84cDQ3,netdev=idjiXt3m,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idjiXt3m \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \ 3. After installation, shutdown the guest and check the image then. # qemu-img check base.qcow2 No errors were found on the image. 25508/327680 = 7.78% allocated, 16.32% fragmented, 0.00% compressed clusters Image end offset: 1672675328 # qemu-img info base.qcow2 image: base.qcow2 file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: 1.56 GiB cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false 4. Boot again from the base image(The command line is the same as the one in step2), and ‘dd’ in the guest. (guest)while true ; do dd if=/dev/zero of=test bs=1024k count=4000 ; done 5. Take internal snapshot while ‘dd’ is executed in guest (qemu) savevm foo (qemu) quit 6. Run loop # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest-rhel8.0' -machine q35 -nodefaults -vga qxl -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 -drive id=drive_image1,if=none,snapshot=off,format=qcow2,file=base.qcow2 -device scsi-hd,id=image1,drive=drive_image1,serial=SYSTEM_DISK0 -vnc :0 -monitor stdio -m 4096 -smp 8; done QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit …… Result: Did not hit the error info even after 20 times loop.
Does preallocating the test image help? It seems to be much better reproducible if I create the test image with: $ qemu-img create -f qcow2 -o preallocation=falloc qtest.qcow2 20G Max
Reproduced successfully with preallocation=falloc when creatation. Thanks a lot Max. Tested with: qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61 1.Create image file # qemu-img create -f qcow2 -o preallocation=falloc base.qcow2 20G 2. Install guest with it # /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt -cdrom RHEL7.7-Server-x86_64.iso 3. Shutdown after installation, and check the image file # qemu-img check base.qcow2 No errors were found on the image. 327680/327680 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters Image end offset: 21478375424 # qemu-img info base.qcow2 image: base.qcow2 file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: 24 GiB cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false 4. Boot guest again from the image file # /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt 5. ‘Savevm’ while ‘dd’ in the guest (guest) $ while true ; do dd if=/dev/zero of=ftest bs=1024k count=4000 ; done (qemu) savevm foo (qemu) quit 6. Do test loop # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo Error: Error while deleting snapshot on device 'tt': Failed to free the cluster and L1 table: Invalid argument ----------------------------------- Hit error here! (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo Error: Device 'tt' does not have the requested snapshot 'foo' (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) q q ^Cqemu-kvm: terminating on signal 2 Result: As above, hit the error info. Check the image after qemu quit, the image is corrupted. # qemu-img check base.qcow2 …... …... Leaked cluster 361093 refcount=3 reference=2 Leaked cluster 361094 refcount=3 reference=2 Leaked cluster 361095 refcount=3 reference=2 Leaked cluster 361096 refcount=3 reference=2 Leaked cluster 367004 refcount=1 reference=0 Leaked cluster 367005 refcount=1 reference=0 Leaked cluster 367006 refcount=1 reference=0 Leaked cluster 367007 refcount=1 reference=0 Leaked cluster 367008 refcount=1 reference=0 Leaked cluster 367009 refcount=1 reference=0 Leaked cluster 367010 refcount=1 reference=0 Leaked cluster 367011 refcount=1 reference=0 Leaked cluster 367018 refcount=1 reference=0 95 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 229436 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 326148/327680 = 99.53% allocated, 0.27% fragmented, 0.00% compressed clusters Image end offset: 24072421376
Tried to test this bug as below: Tested with: qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4 kernel-4.18.0-141.el8 Steps are the same as the ones in Comment 15. Result- Did not hit the issue after 20 times loop # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) stop (qemu) savevm foo (qemu) quit
(In reply to Tingting Mao from comment #17) > Tried to test this bug as below: > > Tested with: > qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4 > kernel-4.18.0-141.el8 > > > Steps are the same as the ones in Comment 15. > > > Result- Did not hit the issue after 20 times loop > > # while true ; do (echo loadvm foo ; echo c ; sleep 10 ; echo stop ; echo > savevm foo ; echo quit ) | /usr/libexec/qemu-kvm -name 'guest' -machine q35 > -m 4096 -monitor stdio -vnc :0 -drive file=base.qcow2,id=tt; done > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit > QEMU 4.1.0 monitor - type 'help' for more information > (qemu) loadvm foo > (qemu) c > (qemu) stop > (qemu) savevm foo > (qemu) quit And after quit the loop, there is no corruption in the image. ... ... EMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo (qemu) c (qemu) ^Vstop (qemu) savevm foo ^C^C^Cq ^V^C(qemu) qemu-kvm: terminating on signal 2 QEMU 4.1.0 monitor - type 'help' for more information (qemu) loadvm foo ^C(qemu) qemu-kvm: terminating on signal 2 # qemu-img check base.qcow2 No errors were found on the image. 334488/327680 = 102.08% allocated, 3.26% fragmented, 0.00% compressed clusters Image end offset: 21991522304
Based on Comment 17, Comment 18 and Comment 19, set this bug as verified. Thanks all.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3732