Description of problem: In qemu 6.0.0, qemu-nbd fixed hole reporting for raw images: https://github.com/qemu/qemu/commit/0da9856851dcca09222a1467e16ddd05dc66e460 But this change broke reporting of zeroed area in qcow2 image with a backup file. Previously this was reported as NBD_STATE_ZERO, but not it is reported as NBD_STATE_ZERO | NBD_STATE_HOLE. Here is example reproduction: # Create an image with a backing file: $ qemu-img create base.raw $((4*64*1024)) $ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2 # Write data to the first 3 cluster in the base image: $ qemu-io -f raw -c "write -P 65 0 64k" base.raw $ qemu-io -f raw -c "write -P 66 64k 64k" base.raw $ qemu-io -f raw -c "write -P 67 128k 64k" base.raw # Write data to the second cluster of the top image, hiding the second cluster in the backing file: $ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2 # Write zeroes to the third cluster of the top imge, hiding the third cluster in the backing file: $ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2 This creates (A-Z: data, 0: zero, -: hole): top: -D0- base: ABC- # Query the image via qemu-nbd $ qemu-nbd -r -t -f qcow2 top.qcow2 & $ qemu-img map --output json nbd://localhost [{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false, "offset": 131072}] $ nbdinfo --map nbd://localhost 0 131072 0 allocated 131072 131072 3 hole,zero Both qemu-img map and qemu-nbd report the third cluster as a hole. However qemu-nbd include the depth parameter, so we can tell that the third cluster is allocated in the top image. qemu-nbd does not report the depth of each extent, so there is no way to tell that the third cluster must be zeroed in the top image. This does not effect guest using NBD to access the image, but if the caller wants to copy both images to other storage via qemu-nbd, keeping the all the layers, there no way to tell the correct contents of the top image. RHV supports copying snapshots form one system to another via qemu-nbd, using this pipeline: src.img -> qemu-nbd -> imageio server -> imageio-client -> qemu-nbd -> dst.img This can also be used by 3rd party code: src.img -> qemu-nbd -> imageio server > 3rd party client When RHV is configured to allow copying single image from an image chain (disk snapshots in RHV tersm), it uses qemu-nbd on the source side like this: qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": {"driver": "file", "filename": "top.qcow2"}}' imageio client request image extents from imageio server, using NBD block status with qemu-nbd to return the extents: [ { "start": 0, "length": 131072, "zero": false, "hole": false}, { "start": 131072, "length": 131072, "zero": true, "hole": true} ] This is part of imageio API: http://ovirt.github.io/ovirt-imageio/images.html#extents Based on this, imageio client copies the first extent to the destination image, and skip the second extents which is a hole. After copying both images, the result will be an image with a hole in the third cluster, exposing the third cluster form the base image. This will corrupt the image. RHV supports this flow for 3 use cases: 1. Snapshot based backup - this will create corrupted backup silently 2. Restore backups keeping all snapshots - this will create corrupted image, which will fail once the guest is started. RHV user may use imageio client tool, imageio python library or the HTTP API directly. RHV introduced this feature in 4.4.4, tested with qemu-nbd 4.2. Version-Release number of selected component (if applicable): qemu-img-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64 Expected results: Report zeroed area in qcow2 image without the NBD_HOLE_BIT. Example output that will be compatible with existing RHV code is: $ nbdinfo --map nbd://localhost 0 131072 0 allocated 131072 65536 2 zero 196608 65536 3 hole,zero RHV will report: [ { "start": 0, "length": 131072, "zero": false, "hole": false}, { "start": 131072, "length": 65536, "zero": true, "hole": false} { "start": 196608, "length": 65536, "zero": true, "hole": true} ] Client code can restore the qcow2 image by copying the first extents, and writing zeroes in the second extent. This issue affects also other programs like nbdcopy if they will try to support copying of single images from qcow2 chain.
Eric, can you have a look at this one?
I posted experimental fix upstream: https://lists.nongnu.org/archive/html/qemu-block/2021-06/msg00347.html This fixes RHV issue, but it is not complete and correct for all cases, but there is discussion on the right way to solve this issue.
I posted a counter-proposal patch that makes it easier to get at the already-existing information in qemu: https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg02446.html But note that such a patch requires a change in ovirt to consume the new context; and there is still more work to be done before qemu-img map can access the new information. nbdinfo can easily list it, but even that project would need a patch to expose human-readable names of the bits. So the upstream solution requires quite a few downstream backports across multiple projects, vs. a simpler choice of a downstream-only revert in just qemu of the patch that changed qemu 6.0 behavior. But note that a downstream revert, while feasible in the short term, will not be perpetuated indefinitely, so ultimately ovirt will need to start paying attention to either qemu:allocation-depth or the proposed qemu:joint-allocation, where a qemu revert is only an option for qemu versions that must be used with unpatched ovirt.
(In reply to Eric Blake from comment #3) > But note that such a patch requires a change in ovirt to consume the new > context; This is fine, I started to work on this here: https://gerrit.ovirt.org/c/ovirt-imageio/+/115197 We can consume the new context in next ovirt release - 4.4.7. > and there is still more work to be done before qemu-img map can > access the new information. nbdinfo can easily list it, but even that > project would need a patch to expose human-readable names of the bits. We don't depend on qemu-img map or nbdinfo, so this is fine. > So the upstream solution requires quite a few downstream backports across > multiple projects, vs. a simpler choice of a downstream-only revert in just > qemu of the patch that changed qemu 6.0 behavior. But note that a > downstream revert, while feasible in the short term, will not be perpetuated > indefinitely, so ultimately ovirt will need to start paying attention to > either qemu:allocation-depth or the proposed qemu:joint-allocation, where a > qemu revert is only an option for qemu versions that must be used with > unpatched ovirt. I think we must revert the original patch downstream regardless of the new context, since it breaks existing code, and the breakage is the worst kind, silent data corruption during backup. Reverting the patch will also give us more time work on a better long term solution. We will be happy to consume the new context in qemu 6.0, but we can also wait to 6.1 if this work needs more time.
RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug 1971182.
Can reproduce this issue, Version: qemu-kvm-6.0.0-19.module+el8.5.0+11385+6e7d542e.x86_64 kernel-4.18.0-310.el8.x86_64 Steps: # Create an image with a backing file: $ qemu-img create base.raw $((4*64*1024)) $ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2 # Write data to the first 3 cluster in the base image: $ qemu-io -f raw -c "write -P 65 0 64k" base.raw $ qemu-io -f raw -c "write -P 66 64k 64k" base.raw $ qemu-io -f raw -c "write -P 67 128k 64k" base.raw # Write data to the second cluster of the top image, hiding the second cluster in the backing file: $ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2 # Write zeroes to the third cluster of the top imge, hiding the third cluster in the backing file: $ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2 # Query the image via qemu-nbd $ qemu-nbd -r -t -f qcow2 top.qcow2 & $ qemu-img map --output json nbd://localhost [{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false, "offset": 131072}] $ nbdinfo --map nbd://localhost 0 131072 0 allocated 131072 131072 3 hole,zero Expected result: should not report a hole.
Seems like the upstream discussion converged? Can we expect a POST soon?
(In reply to Klaus Heinrich Kiwi from comment #7) > Seems like the upstream discussion converged? Can we expect a POST soon? Eric, any updates?
(In reply to Nir Soffer from comment #5) > RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug > 1971182. Nir, this fix was merged upstream (commit 8417e1378c) and will get that on 8.6 rebase. We haven't backported it yet for 8.5. Based on RHV changes, do we still need it?
(In reply to Klaus Heinrich Kiwi from comment #9) > (In reply to Nir Soffer from comment #5) > > RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug > > 1971182. > this fix was merged upstream (commit 8417e1378c) and will get that on 8.6 > rebase. We haven't backported it yet for 8.5. Based on RHV changes, do we > still need it? I don't think this was fixed, what was fixed was reporting holes in a better way in qemu-img map, but we don't use it, we use qemu-nbd, and it was not fixed since the change is considered a bug fix. But RHV is using now allocation depth reported by qemu-nbd since 5.2.0, so this bug does not affect us now (bug 1971182). The only issue is with older RHV versions (< 4.4.7) that may get wrong info from qemu-nbd >= 6. The only way to avoid the issue is to revert the change in rhel 8.5, but I'm not sure it worth the effort since users can and should upgrade to RHV >= 4.4.7 anyway. So I would close this bug as wontfix.
(In reply to Nir Soffer from comment #10) > So I would close this bug as wontfix. Thanks for the clarification, Nir.