Bug 1968693 - qemu-nbd block status reports allocated zero area in qcow2 image as a hole
Summary: qemu-nbd block status reports allocated zero area in qcow2 image as a hole
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.5
Assignee: Eric Blake
QA Contact: Tingting Mao
URL:
Whiteboard:
Depends On:
Blocks: 1975688
TreeView+ depends on / blocked
 
Reported: 2021-06-07 20:02 UTC by Nir Soffer
Modified: 2023-03-14 19:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1975688 (view as bug list)
Environment:
Last Closed: 2021-08-17 16:36:37 UTC
Type: Bug
Target Upstream Version: qemu-6.1
Embargoed:


Attachments (Terms of Use)

Description Nir Soffer 2021-06-07 20:02:13 UTC
Description of problem:

In qemu 6.0.0, qemu-nbd fixed hole reporting for raw images:
https://github.com/qemu/qemu/commit/0da9856851dcca09222a1467e16ddd05dc66e460

But this change broke reporting of zeroed area in qcow2 image with a backup file.
Previously this was reported as NBD_STATE_ZERO, but not it is reported as
NBD_STATE_ZERO | NBD_STATE_HOLE.

Here is example reproduction:

# Create an image with a backing file:
$ qemu-img create base.raw $((4*64*1024))
$ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2

# Write data to the first 3 cluster in the base image:
$ qemu-io -f raw -c "write -P 65 0 64k" base.raw
$ qemu-io -f raw -c "write -P 66 64k 64k" base.raw
$ qemu-io -f raw -c "write -P 67 128k 64k" base.raw

# Write data to the second cluster of the top image, hiding the second cluster in the backing file:
$ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2

# Write zeroes to the third cluster of the top imge, hiding the third cluster in the backing file:
$ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2


This creates (A-Z: data, 0: zero, -: hole):

top:  -D0-
base: ABC-


# Query the image via qemu-nbd

$ qemu-nbd -r -t -f qcow2 top.qcow2 &

$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": true, "offset": 0},
{ "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false, "offset": 131072}]

$ nbdinfo --map nbd://localhost
         0      131072    0  allocated
    131072      131072    3  hole,zero

Both qemu-img map and qemu-nbd report the third cluster as a hole. However qemu-nbd include
the depth parameter, so we can tell that the third cluster is allocated in the top image.
qemu-nbd does not report the depth of each extent, so there is no way to tell that the 
third cluster must be zeroed in the top image.

This does not effect guest using NBD to access the image, but if the caller wants to copy
both images to other storage via qemu-nbd, keeping the all the layers, there no way to tell
the correct contents of the top image.

RHV supports copying snapshots form one system to another via qemu-nbd, using this pipeline:

    src.img -> qemu-nbd -> imageio server -> imageio-client -> qemu-nbd -> dst.img

This can also be used by 3rd party code:

    src.img -> qemu-nbd -> imageio server > 3rd party client

When RHV is configured to allow copying single image from an image chain (disk snapshots
in RHV tersm), it uses qemu-nbd on the source side like this:

    qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": {"driver": "file", "filename": "top.qcow2"}}'

imageio client request image extents from imageio server, using NBD block status with
qemu-nbd to return the extents:

    [
        { "start": 0, "length": 131072, "zero": false, "hole": false},
        { "start": 131072, "length": 131072, "zero": true, "hole": true}
    ]

This is part of imageio API:
http://ovirt.github.io/ovirt-imageio/images.html#extents

Based on this, imageio client copies the first extent to the destination image,
and skip the second extents which is a hole.

After copying both images, the result will be an image with a hole in the third cluster,
exposing the third cluster form the base image. This will corrupt the image.

RHV supports this flow for 3 use cases:
1. Snapshot based backup - this will create corrupted backup silently
2. Restore backups keeping all snapshots - this will create corrupted image, which
   will fail once the guest is started.

RHV user may use imageio client tool, imageio python library or the HTTP API
directly.

RHV introduced this feature in 4.4.4, tested with qemu-nbd 4.2.

Version-Release number of selected component (if applicable):
qemu-img-6.0.0-17.module+el8.5.0+11173+c9fce0bb.x86_64

Expected results:

Report zeroed area in qcow2 image without the NBD_HOLE_BIT.

Example output that will be compatible with existing RHV code is:

$ nbdinfo --map nbd://localhost
         0      131072    0  allocated
    131072       65536    2  zero
    196608       65536    3  hole,zero

RHV will report:

    [
        { "start": 0, "length": 131072, "zero": false, "hole": false},
        { "start": 131072, "length": 65536, "zero": true, "hole": false}
        { "start": 196608, "length": 65536, "zero": true, "hole": true}
    ]

Client code can restore the qcow2 image by copying the first extents, and writing
zeroes in the second extent.


This issue affects also other programs like nbdcopy if they will try to support
copying of single images from qcow2 chain.

Comment 1 Klaus Heinrich Kiwi 2021-06-08 18:46:17 UTC
Eric, can you have a look at this one?

Comment 2 Nir Soffer 2021-06-08 20:47:12 UTC
I posted experimental fix upstream:
https://lists.nongnu.org/archive/html/qemu-block/2021-06/msg00347.html

This fixes RHV issue, but it is not complete and correct for all
cases, but there is discussion on the right way to solve this issue.

Comment 3 Eric Blake 2021-06-09 18:07:58 UTC
I posted a counter-proposal patch that makes it easier to get at the already-existing information in qemu:
https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg02446.html

But note that such a patch requires a change in ovirt to consume the new context; and there is still more work to be done before qemu-img map can access the new information.  nbdinfo can easily list it, but even that project would need a patch to expose human-readable names of the bits.

So the upstream solution requires quite a few downstream backports across multiple projects, vs. a simpler choice of a downstream-only revert in just qemu of the patch that changed qemu 6.0 behavior.  But note that a downstream revert, while feasible in the short term, will not be perpetuated indefinitely, so ultimately ovirt will need to start paying attention to either qemu:allocation-depth or the proposed qemu:joint-allocation, where a qemu revert is only an option for qemu versions that must be used with unpatched ovirt.

Comment 4 Nir Soffer 2021-06-10 13:31:01 UTC
(In reply to Eric Blake from comment #3)
> But note that such a patch requires a change in ovirt to consume the new
> context; 

This is fine, I started to work on this here:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115197

We can consume the new context in next ovirt release - 4.4.7.

> and there is still more work to be done before qemu-img map can
> access the new information.  nbdinfo can easily list it, but even that
> project would need a patch to expose human-readable names of the bits.

We don't depend on qemu-img map or nbdinfo, so this is fine. 

> So the upstream solution requires quite a few downstream backports across
> multiple projects, vs. a simpler choice of a downstream-only revert in just
> qemu of the patch that changed qemu 6.0 behavior.  But note that a
> downstream revert, while feasible in the short term, will not be perpetuated
> indefinitely, so ultimately ovirt will need to start paying attention to
> either qemu:allocation-depth or the proposed qemu:joint-allocation, where a
> qemu revert is only an option for qemu versions that must be used with
> unpatched ovirt.

I think we must revert the original patch downstream regardless of the new
context, since it breaks existing code, and the breakage is the worst kind,
silent data corruption during backup.

Reverting the patch will also give us more time work on a better long term
solution.

We will be happy to consume the new context in qemu 6.0, but we can also
wait to 6.1 if this work needs more time.

Comment 5 Nir Soffer 2021-06-12 18:52:25 UTC
RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug 1971182.

Comment 6 zixchen 2021-06-15 05:34:52 UTC
Can reproduce this issue,
Version:
qemu-kvm-6.0.0-19.module+el8.5.0+11385+6e7d542e.x86_64
kernel-4.18.0-310.el8.x86_64

Steps:
# Create an image with a backing file:
$ qemu-img create base.raw $((4*64*1024))
$ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2

# Write data to the first 3 cluster in the base image:
$ qemu-io -f raw -c "write -P 65 0 64k" base.raw
$ qemu-io -f raw -c "write -P 66 64k 64k" base.raw
$ qemu-io -f raw -c "write -P 67 128k 64k" base.raw

# Write data to the second cluster of the top image, hiding the second cluster in the backing file:
$ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2

# Write zeroes to the third cluster of the top imge, hiding the third cluster in the backing file:
$ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2



# Query the image via qemu-nbd

$ qemu-nbd -r -t -f qcow2 top.qcow2 &

$ qemu-img map --output json nbd://localhost
[{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": true, "offset": 0},
{ "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false, "offset": 131072}]

$ nbdinfo --map nbd://localhost
         0      131072    0  allocated
    131072      131072    3  hole,zero

Expected result:
should not report a hole.

Comment 7 Klaus Heinrich Kiwi 2021-06-29 18:03:15 UTC
Seems like the upstream discussion converged? Can we expect a POST soon?

Comment 8 Klaus Heinrich Kiwi 2021-07-13 16:45:07 UTC
(In reply to Klaus Heinrich Kiwi from comment #7)
> Seems like the upstream discussion converged? Can we expect a POST soon?

Eric, any updates?

Comment 9 Klaus Heinrich Kiwi 2021-08-11 19:08:08 UTC
(In reply to Nir Soffer from comment #5)
> RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug
> 1971182.

Nir,

 this fix was merged upstream (commit 8417e1378c) and will get that on 8.6 rebase. We haven't backported it yet for 8.5. Based on RHV changes, do we still need it?

Comment 10 Nir Soffer 2021-08-11 19:52:31 UTC
(In reply to Klaus Heinrich Kiwi from comment #9)
> (In reply to Nir Soffer from comment #5)
> > RHV-4.4.7 will resolve this issue by using qemu:allocation-depth, see bug
> > 1971182.
>  this fix was merged upstream (commit 8417e1378c) and will get that on 8.6
> rebase. We haven't backported it yet for 8.5. Based on RHV changes, do we
> still need it?

I don't think this was fixed, what was fixed was reporting holes in a better
way in qemu-img map, but we don't use it, we use qemu-nbd, and it was not 
fixed since the change is considered a bug fix.

But RHV is using now allocation depth reported by qemu-nbd since 5.2.0, so
this bug does not affect us now (bug 1971182).

The only issue is with older RHV versions (< 4.4.7) that may get wrong
info from qemu-nbd >= 6. The only way to avoid the issue is to revert the 
change in rhel 8.5, but I'm not sure it worth the effort since users can
and should upgrade to RHV >= 4.4.7 anyway.

So I would close this bug as wontfix.

Comment 11 Klaus Heinrich Kiwi 2021-08-17 16:36:37 UTC
(In reply to Nir Soffer from comment #10)

> So I would close this bug as wontfix.

Thanks for the clarification, Nir.


Note You need to log in before you can comment on or make changes to this bug.