Bug 2072242

Summary: Fail to rebuild the reference count tables of qcow2 image on host block devices (e.g. LVs) [rhel-8.6.0.z]
Product: Red Hat Enterprise Linux 8 Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: qcow2 QA Contact: Tingting Mao <timao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: areis, chayang, coli, gveitmic, hreitz, jinzhao, juzhang, kanderso, knoel, kwolf, mrezanin, mtessun, ngu, nsoffer, qzhang, rbalakri, timao, vcojot, virt-maint, xuwei, zixchen
Version: 8.5Keywords: Reopened, Triaged, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-11.module+el8.6.0+15404+4d757596.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1519071 Environment:
Last Closed: 2022-08-02 10:01:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1519071, 2072379    
Bug Blocks:    

Comment 5 Yanan Fu 2022-05-26 05:21:32 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 6 Tingting Mao 2022-05-30 06:44:42 UTC
Verified this bug as below:


Tested with:
qemu-kvm-6.2.0-11.module+el8.6.0+15404+4d757596.1
kernel-4.18.0-372.10.1.el8_6.x86_64


Steps:
1. Prepare a LV as below
# qemu-img create -f raw loop.img 50G
# losetup /dev/loop1 /home/timao/test/loop.img
# pvcreate /dev/loop1
# vgcreate vgroup /dev/loop1
# lvcreate -L 30G -n lv vgroup

2. Convert a installed well qcow2 to the lv above
# qemu-img check -r all RHEL-8.6-x86_64-latest.qcow2 
No errors were found on the image.
31054/163840 = 18.95% allocated, 92.09% fragmented, 90.65% compressed clusters
Image end offset: 966262784

# qemu-img convert -f qcow2 -O qcow2 -o lazy_refcounts=on,compat=1.1 RHEL-8.6-x86_64-latest.qcow2 /dev/vgroup/lv -p

# qemu-img info /dev/vgroup/lv 
image: /dev/vgroup/lv
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: true
    refcount bits: 16
    corrupt: false
    extended l2: false

3. Boot up a guest from the lv
# /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 15360  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:1c:0c:0d:e3:4c,id=idjmZXQS,netdev=idEFQ4i1,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idEFQ4i1,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -device pcie-root-port,id=pcie-root-port-5,port=0x6,addr=0x1.0x5,bus=pcie.0,chassis=5 \
    -device virtio-scsi-pci,id=virtio_scsi_pci2,bus=pcie-root-port-5,addr=0x0 \
    -blockdev node-name=file_image1,driver=host_device,auto-read-only=on,discard=unmap,aio=threads,filename=/dev/vgroup/lv,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=off,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=off \
    -chardev socket,server=on,path=/var/tmp/monitor-qmpmonitor1-20210721-024113-AsZ7KYro,id=qmp_id_qmpmonitor1,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \

4. dd file and get md5 value inside guest with sync
(guest)# dd if=/dev/urandom of=file1 conv=fsync bs=1M count=512 ; md5sum file1 ; sync

5. Kill qemu-kvm process in host immediately after the dd finished in last step.
# kill -9 `pidof qemu-kvm`

6. Check the lv image file
# qemu-img check -r all /dev/vgroup/lv 
......
......
ERROR cluster 40641 refcount=0 reference=1
ERROR cluster 40642 refcount=0 reference=1
ERROR cluster 40643 refcount=0 reference=1
ERROR cluster 40644 refcount=0 reference=1
ERROR cluster 40645 refcount=0 reference=1
Rebuilding refcount structure
Repairing cluster 40218 refcount=1 reference=0
Repairing cluster 40219 refcount=1 reference=0
Repairing cluster 40220 refcount=1 reference=0
The following inconsistencies were found and repaired:

    0 leaked clusters
    410 corruptions

Double checking the fixed image now...
No errors were found on the image.
40626/163840 = 24.80% allocated, 1.11% fragmented, 0.00% compressed clusters
Image end offset: 2663841792


Results:
As above, 'check -r all' fixed the image.

Comment 9 Tingting Mao 2022-05-31 03:23:23 UTC
According to comment6, set this bug as verified.

Comment 18 errata-xmlrpc 2022-08-02 10:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5821