Bug 2072379

Summary: Fail to rebuild the reference count tables of qcow2 image on host block devices (e.g. LVs)
Product: Red Hat Enterprise Linux 9 Reporter: Tingting Mao <timao>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: qcow2 QA Contact: Tingting Mao <timao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: areis, chayang, coli, gveitmic, hreitz, jinzhao, juzhang, kanderso, knoel, kwolf, mtessun, ngu, nsoffer, qzhang, rbalakri, timao, vcojot, virt-maint, xuwei, xzhou, zixchen
Version: 9.0Keywords: Reopened, Triaged, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-7.0.0-6.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1519071 Environment:
Last Closed: 2022-11-15 09:54:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1519071    
Bug Blocks: 2072242    

Description Tingting Mao 2022-04-06 07:54:09 UTC
+++ This bug was initially created as a clone of Bug #1519071 +++

Description of problem:
Create an LV image with lazy_refcounts=on and install guest with cache=writethrough; After installation finished, write file inside guest and kill the qemu process; After that, check the image, "qemu-img check -r all" reports lots of errors.

Version-Release number of selected component (if applicable):
qemu-kvm-6.2.0-12.el9
kernel-5.14.0-75.el9.x86_64


How reproducible: 
100%


Steps to Reproduce:
1. Prepare a LV as below
# qemu-img create -f raw loop.img 50G
# losetup /dev/loop1 /home/timao/test/loop.img
# pvcreate /dev/loop1
# vgcreate vgroup /dev/loop1
# lvcreate -L 30G -n lv vgroup

2. Convert a installed well qcow2 to the lv above
# qemu-img check -r all RHEL-8.6-x86_64-latest.qcow2 
No errors were found on the image.
31054/163840 = 18.95% allocated, 92.09% fragmented, 90.65% compressed clusters
Image end offset: 966262784

# qemu-img convert -f qcow2 -O qcow2 -o lazy_refcounts=on,compat=1.1 RHEL-8.6-x86_64-latest.qcow2 /dev/vgroup/lv -p

# qemu-img info /dev/vgroup/lv 
image: /dev/vgroup/lv
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: true
    refcount bits: 16
    corrupt: false
    extended l2: false

3. Boot up a guest from the lv
# /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:1c:0c:0d:e3:4c,id=idjmZXQS,netdev=idEFQ4i1,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idEFQ4i1,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -device pcie-root-port,id=pcie-root-port-5,port=0x6,addr=0x1.0x5,bus=pcie.0,chassis=5 \
    -device virtio-scsi-pci,id=virtio_scsi_pci2,bus=pcie-root-port-5,addr=0x0 \
    -blockdev node-name=file_image1,driver=host_device,auto-read-only=on,discard=unmap,aio=threads,filename=/dev/vgroup/lv,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=off,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=off \
    -chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \

4. dd file and get md5 value inside guest with sync
(guest)# dd if=/dev/urandom of=file1 conv=fsync bs=1M count=512 ; md5sum file1 ; sync

5. Kill qemu-kvm process in host immediately after the dd finished in last step.
# kill -9 `pidof qemu-kvm`

6. Check the lv image file
# qemu-img check -r all /dev/vgroup/lv 
ERROR cluster 33067 refcount=0 reference=1
ERROR cluster 33068 refcount=0 reference=1
ERROR cluster 33069 refcount=0 reference=1
ERROR cluster 33070 refcount=0 reference=1
ERROR cluster 33071 refcount=0 reference=1
ERROR cluster 33072 refcount=0 reference=1
......
......
ERROR cluster 40299 refcount=0 reference=1
ERROR cluster 40300 refcount=0 reference=1
Rebuilding refcount structure
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device

Comment 5 Yanan Fu 2022-06-13 09:54:54 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 8 Tingting Mao 2022-06-16 09:50:51 UTC
Verified this bug as below.


Tested with:
qemu-kvm-7.0.0-6.el9
kernel-5.14.0-96.el9.x86_64


Steps:
1. Prepare a LV as below
# qemu-img create -f raw loop.img 50G
# losetup /dev/loop1 /home/timao/test/loop.img
# pvcreate /dev/loop1
# vgroup vgroup /dev/loop1
# lvcreate -L 30G -n lv vgroup

2. Convert a installed well qcow2 to the lv above
# qemu-img check -r all RHEL-8.6-x86_64-latest.qcow2 
No errors were found on the image.
30848/163840 = 18.83% allocated, 91.29% fragmented, 89.69% compressed clusters
Image end offset: 985595904

# qemu-img convert -f qcow2 -O qcow2 -o lazy_refcounts=on,compat=1.1 RHEL-8.6-x86_64-latest.qcow2 /dev/vgroup/lv -p

# qemu-img info /dev/vgroup/lv 
image: /dev/vgroup/lv
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: true
    refcount bits: 16
    corrupt: false
    extended l2: false

3. Boot up a guest from the lv
# /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:1c:0c:0d:e3:4c,id=idjmZXQS,netdev=idEFQ4i1,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idEFQ4i1,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -device pcie-root-port,id=pcie-root-port-5,port=0x6,addr=0x1.0x5,bus=pcie.0,chassis=5 \
    -device virtio-scsi-pci,id=virtio_scsi_pci2,bus=pcie-root-port-5,addr=0x0 \
    -blockdev node-name=file_image1,driver=host_device,auto-read-only=on,discard=unmap,aio=threads,filename=/dev/vgroup/lv,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=off,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=off \
    -chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \

4. dd file and get md5 value inside guest with sync
(guest)# dd if=/dev/urandom of=file1 conv=fsync bs=1M count=512 ; md5sum file1 ; sync

5. Kill qemu-kvm process in host immediately after the dd finished in last step.
# kill -9 `pidof qemu-kvm`

6. Check the lv image file
# qemu-img check -r all /dev/vgroup/lv 
......
......
ERROR cluster 48276 refcount=0 reference=1
ERROR cluster 48277 refcount=0 reference=1
Rebuilding refcount structure
Repairing cluster 1 refcount=1 reference=0
Repairing cluster 2 refcount=1 reference=0
Repairing cluster 32774 refcount=1 reference=0
The following inconsistencies were found and repaired:

    0 leaked clusters
    7039 corruptions

Double checking the fixed image now...
No errors were found on the image.
48260/163840 = 29.46% allocated, 2.25% fragmented, 0.00% compressed clusters
Image end offset: 3164143616


Results:
As above, 'check -r all' fixed the image.

Comment 10 errata-xmlrpc 2022-11-15 09:54:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967