Description of problem: Since ovirt 4.4.4 ovirt-imageio supports transferring single snapshots: - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/list_disk_snapshots.py - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/download_disk_snapshot.py - https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_disk.py This feature depends on the detecting unallocated areas in single qcow2 images, and was implemented using standard NBD "base:allocation" meta context in qemu-nbd. Using "base:allocation" is not entirely correct, since NBD server may omit information about holes, but it works since ovirt-imageio uses only qemu-nbd. Unfortunately qemu 6.0.0 change the way zeroed clusters in qcow2 images are reported (bug 1968693). Previously they were reported as: NBD_STATE_ZERO But with qemu 6.0.0 they are reported as: NBD_STATE_ZERO | NBD_STATE_HOLE This change is considered a bug fix in qemu, and it is not possible to revert this change in upstream qemu. We can fix this issue using the new "qemu:allocation-depth" meta context introduced in qemu 5.2.0. This meta context expose reliable (not optional) information about unallocated areas in a qcow2 image. Change imageio nbd client to use "qemu:allocation-depth", and use it to report holes. Version-Release number of selected component (if applicable): 2.1.1 With this change uploading and downloading single snapshots should work with both qemu 5.2.0 (RHEL 8.4) and qemu 6.0.0 (Centos Stream).
To reproduce the issue we need to download a snapshot using the NBD backend. This flow is used by backup applications that use snapshot based backups. 1. Install a host using RHEL 8.5 AV nightly You should have this qemu version: $ rpm -q qemu-kvm qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 2. Create a vm 3. Add a thin virtio-scsi disk to the vm Inside the guest: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 615M 0 part [SWAP] └─sda4 8:4 0 4.4G 0 part / sdb 8:16 0 10G 0 disk sr0 11:0 1 1024M 0 rom 4. Write data to the fist cluster of disk /dev/sdb # echo "data from base" > /dev/sdb # sync 5. Create snapshot including the second disk 6. Zero the first cluster of disk sdb In the guest run: # fallocate --punch-hole --length 64k /dev/sdb In the guest we cannot see the "data from base" now: # dd if=/dev/sdb bs=512 count=1 status=none | hexdump 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000200 The first cluster contains zeroes now. 7. Stop the vm 8. List the second disk snapshots. In this example the disk id is 2b720ee1-baad-48d8-bfdb-582946b82448 $ python3 list_disk_snapshots.py -c engine-dev 2b720ee1-baad-48d8-bfdb-582946b82448 [ { "actual_size": 1073741824, "format": "cow", "id": "dcb903e7-5567-4eb8-a240-85cbea1dce12", "parent": null, "status": "ok" }, { "actual_size": 1073741824, "format": "cow", "id": "fb14e6e4-00a7-48de-b7e0-d23996889cbd", "parent": "dcb903e7-5567-4eb8-a240-85cbea1dce12", "status": "ok" } ] 9. Download base image (id=dcb903e7-5567-4eb8-a240-85cbea1dce12) In this example the disk is on the storage domain "iscsi-00". $ python download_disk_snapshot.py -c engine-dev iscsi-00 dcb903e7-5567-4eb8-a240-85cbea1dce12 base.qcow2 10. Download top disk snapshot (id=fb14e6e4-00a7-48de-b7e0-d23996889cbd) $ python3 download_disk_snapshot.py -c engine-dev iscsi-00 fb14e6e4-00a7-48de-b7e0-d23996889cbd --backing-file base.qcow2 top.qcow2 11. Create a checksum of the disk $ python3 checksum_disk.py -c engine-dev 2b720ee1-baad-48d8-bfdb-582946b82448 { "algorithm": "blake2b", "block_size": 4194304, "checksum": "b6dcdb509ec27d672ab91ddf6289365469668fa0d6a5de5cbc594c0ea3102825" } 12. Create a checksum of the downloaded image $ python3 checksum_image.py top.qcow2 { "algorithm": "blake2b", "block_size": 4194304, "checksum": "fcb7d96381087e6a6d5b07421342d90dd32e6ad5a21b99f37372f29dcd491a7e" } The checksums do not match, the downloaded image is does not contain the same data as the original disk. This flow should be tested as like other image transfer flows. The reason is that the disk snapshot dcb903e7-5567-4eb8-a240-85cbea1dce12 was reported as empty by qemu-nbd during the download. So download_disk_snapshot.py create an empty qcow2 disk: $ qemu-img map --output json top.qcow2 [{ "start": 0, "length": 65536, "depth": 1, "zero": false, "data": true, "offset": 327680}, { "start": 65536, "length": 10737352704, "depth": 1, "zero": true, "data": false}] The top image first cluster was zeroed in the guest, so we expect to see: $ qemu-img map --output json top.qcow2 [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 65536, "length": 10737352704, "depth": 1, "zero": true, "data": false}] So "data from base" is exposed to the guest. To see the data the guest will see, we can convert the disk to raw format: $ qemu-img convert -f qcow2 -O raw top.qcow2 top.img Looking at first cluster show data from base: $ dd if=top.img bs=512 count=1 status=none data from base If we had a file system on this disk, the file system would be corrupted in the downloaded image.
Real flow reproducing the issue. 1. Create test disk $ virt-builder fedora-32 -o fedora-32.qcow2 \ --format=qcow2 \ --hostname=f32 \ --ssh-inject=root \ --root-password=password:root -\ --selinux-relabel \ --install=qemu-guest-agent 2. Upload disk to iscsi/fc storage $ upload_disk.py -c my_engine --sd-name my_sd --disk-sparse fedora-32.qcow2 4. Create vm with disk using: interface: virtio-scsi enable-discard: yes Looks like enable discard cannot be enabled when attaching new vm to a disk. Edit the disk after creating the vm to enable discard. 5. Start the vm 6. Create a big file In the guest run: # dd if=/dev/urandom bs=1M count=1024 of=big-file conv=fsync 7. Create snapshot 1 8. Delete the file and trim In the guest run: # rm big-file # fstrim -av This creates a lot of zeroed clusters in the active vm disk snapshot. 9. Shutdown the vm 10. List disk snapshots $ ./list_disk_snapshots.py -c my_engine 04e20159-443a-447e-bc2c-5620515137dc [ { "actual_size": 4026531840, "format": "cow", "id": "f18550e7-d2b2-427a-bbd0-d39d5d93cdf1", "parent": null, "status": "ok" }, { "actual_size": 1073741824, "format": "cow", "id": "11e79858-d972-4e50-a4d3-501f759c09d7", "parent": "f18550e7-d2b2-427a-bbd0-d39d5d93cdf1", "status": "ok" } ] 11. Download base image $ ./download_disk_snapshot.py -c my_engine my_sd f18550e7-d2b2-427a-bbd0-d39d5d93cdf1 base.qcow2 12. Download top image rebasing on top of base.qcow2 $ ./download_disk_snapshot.py -c my_engine my_sd 11e79858-d972-4e50-a4d3-501f759c09d7 --backing-file base.qcow2 snap1.qcow2 13. Create checksum for the original disk $ ./checksuk_disk.py -c my_engine 04e20159-443a-447e-bc2c-5620515137dc { "algorithm": "blake2b", "block_size": 4194304, "checksum": "73942588b8c2734598d9636499a1324392305056ddd293a04c207de0e56d39c4" } 14. Create checksum for the downloaded image $ ./checksum_image.py snap1.qcow2 { "algorithm": "blake2b", "block_size": 4194304, "checksum": "73942588b8c2734598d9636499a1324392305056ddd293a04c207de0e56d39c4" } The checksums must match. 15. Upload downloaded image to new disk $ ./upload_disk.py -c my_engine --sd-name my_sd --disk-format qcow2 --disk-sparse snap1.qcow2 16. Create new vm from this disk and start the vm The VM must boot normally.
Note that the fix requires vdsm >= 4.40.70.4.
Notes for testing: 1. Reproduce the issue with qemu 6.0.0 This is possible only with ovirt-imageio < 2.2.0-1. I reproduced this on Fedora 32 and RHEL 8.5. 2. Testing with RHEL 8.4 RHEL 8.4 provides qemu 5.2.0. The flows described in comment 2 and comment 3 can be tested with this version. ovirt-imageio 2.2.0-1 change the way we get zero extents info from qemu. We want to make sure using the new way does not introduce regressions.
(continued from comment 6) 3. Testing with RHEL 8.5 I tested the flows in comment 2 and comment 3 with qemu 6.0.0 on RHEL 8.5, so there should be no issue on Centos Stream running same version.
(In reply to Nir Soffer from comment #2) > To reproduce the issue we need to download a snapshot using the NBD > backend. This flow is used by backup applications that use snapshot > based backups. > > 1. Install a host using RHEL 8.5 AV nightly > > You should have this qemu version: > $ rpm -q qemu-kvm > qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 > > 2. Create a vm > 3. Add a thin virtio-scsi disk to the vm > > Inside the guest: > > # lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 6G 0 disk > ├─sda1 8:1 0 1M 0 part > ├─sda2 8:2 0 1G 0 part /boot > ├─sda3 8:3 0 615M 0 part [SWAP] > └─sda4 8:4 0 4.4G 0 part / > sdb 8:16 0 10G 0 disk > sr0 11:0 1 1024M 0 rom > > 4. Write data to the fist cluster of disk /dev/sdb > > # echo "data from base" > /dev/sdb > # sync > > 5. Create snapshot including the second disk > > 6. Zero the first cluster of disk sdb > > In the guest run: > > # fallocate --punch-hole --length 64k /dev/sdb > > In the guest we cannot see the "data from base" now: > > # dd if=/dev/sdb bs=512 count=1 status=none | hexdump > 0000000 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0000200 > > The first cluster contains zeroes now. > > 7. Stop the vm > > 8. List the second disk snapshots. > > In this example the disk id is 2b720ee1-baad-48d8-bfdb-582946b82448 > > $ python3 list_disk_snapshots.py -c engine-dev > 2b720ee1-baad-48d8-bfdb-582946b82448 > [ > { > "actual_size": 1073741824, > "format": "cow", > "id": "dcb903e7-5567-4eb8-a240-85cbea1dce12", > "parent": null, > "status": "ok" > }, > { > "actual_size": 1073741824, > "format": "cow", > "id": "fb14e6e4-00a7-48de-b7e0-d23996889cbd", > "parent": "dcb903e7-5567-4eb8-a240-85cbea1dce12", > "status": "ok" > } > ] > > 9. Download base image (id=dcb903e7-5567-4eb8-a240-85cbea1dce12) > > In this example the disk is on the storage domain "iscsi-00". > > $ python download_disk_snapshot.py -c engine-dev iscsi-00 > dcb903e7-5567-4eb8-a240-85cbea1dce12 base.qcow2 > > 10. Download top disk snapshot (id=fb14e6e4-00a7-48de-b7e0-d23996889cbd) > > $ python3 download_disk_snapshot.py -c engine-dev iscsi-00 > fb14e6e4-00a7-48de-b7e0-d23996889cbd --backing-file base.qcow2 top.qcow2 > > 11. Create a checksum of the disk > > $ python3 checksum_disk.py -c engine-dev 2b720ee1-baad-48d8-bfdb-582946b82448 > { > "algorithm": "blake2b", > "block_size": 4194304, > "checksum": > "b6dcdb509ec27d672ab91ddf6289365469668fa0d6a5de5cbc594c0ea3102825" > } > > 12. Create a checksum of the downloaded image > > $ python3 checksum_image.py top.qcow2 > { > "algorithm": "blake2b", > "block_size": 4194304, > "checksum": > "fcb7d96381087e6a6d5b07421342d90dd32e6ad5a21b99f37372f29dcd491a7e" > } > > The checksums do not match, the downloaded image is does not contain the > same data as the original disk. > Verified on RHV-4.4.7-6: The checksums match. (In reply to Nir Soffer from comment #3) > Real flow reproducing the issue. > > 1. Create test disk > > $ virt-builder fedora-32 -o fedora-32.qcow2 \ > --format=qcow2 \ > --hostname=f32 \ > --ssh-inject=root \ > --root-password=password:root -\ > --selinux-relabel \ > --install=qemu-guest-agent > > 2. Upload disk to iscsi/fc storage > > $ upload_disk.py -c my_engine --sd-name my_sd --disk-sparse fedora-32.qcow2 > > 4. Create vm with disk using: > > interface: virtio-scsi > enable-discard: yes > > Looks like enable discard cannot be enabled when attaching new vm > to a disk. Edit the disk after creating the vm to enable discard. > > 5. Start the vm > > 6. Create a big file > > In the guest run: > > # dd if=/dev/urandom bs=1M count=1024 of=big-file conv=fsync > > 7. Create snapshot 1 > > 8. Delete the file and trim > > In the guest run: > > # rm big-file > # fstrim -av > > This creates a lot of zeroed clusters in the active vm disk snapshot. > > 9. Shutdown the vm > > 10. List disk snapshots > > $ ./list_disk_snapshots.py -c my_engine 04e20159-443a-447e-bc2c-5620515137dc > [ > { > "actual_size": 4026531840, > "format": "cow", > "id": "f18550e7-d2b2-427a-bbd0-d39d5d93cdf1", > "parent": null, > "status": "ok" > }, > { > "actual_size": 1073741824, > "format": "cow", > "id": "11e79858-d972-4e50-a4d3-501f759c09d7", > "parent": "f18550e7-d2b2-427a-bbd0-d39d5d93cdf1", > "status": "ok" > } > ] > > 11. Download base image > > $ ./download_disk_snapshot.py -c my_engine my_sd > f18550e7-d2b2-427a-bbd0-d39d5d93cdf1 base.qcow2 > > 12. Download top image rebasing on top of base.qcow2 > > $ ./download_disk_snapshot.py -c my_engine my_sd > 11e79858-d972-4e50-a4d3-501f759c09d7 --backing-file base.qcow2 snap1.qcow2 > > 13. Create checksum for the original disk > > $ ./checksuk_disk.py -c my_engine 04e20159-443a-447e-bc2c-5620515137dc > { > "algorithm": "blake2b", > "block_size": 4194304, > "checksum": > "73942588b8c2734598d9636499a1324392305056ddd293a04c207de0e56d39c4" > } > > 14. Create checksum for the downloaded image > > $ ./checksum_image.py snap1.qcow2 > { > "algorithm": "blake2b", > "block_size": 4194304, > "checksum": > "73942588b8c2734598d9636499a1324392305056ddd293a04c207de0e56d39c4" > } > > The checksums must match. > > 15. Upload downloaded image to new disk > > $ ./upload_disk.py -c my_engine --sd-name my_sd --disk-format qcow2 > --disk-sparse snap1.qcow2 > > 16. Create new vm from this disk and start the vm > > The VM must boot normally. Verified on RHV-4.4.7-6: The checksums match and the VM boot normally. Moving to 'Verified'.
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.