Bug 1629056

Summary: qemu NBD server failure with block status of dirty bitmaps
Product: Red Hat Enterprise Linux 7 Reporter: Eric Blake <eblake>
Component: qemu-kvm-rhevAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: chayang, coli, eblake, jsnow, juzhang, ngu, rjones, virt-maint
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-27.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-22 09:18:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Blake 2018-09-14 18:15:34 UTC
Description of problem:
Vladimir discovered that qemu as NBD server for the qemu:dirty-bitmap:FOO context can send invalid extents of length 0, when it splits large (4G) regions of the bitmap to a client that permits the server to send more than one extent:
https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-15.el7

How reproducible:
Difficult. qemu as client can't trigger it; it requires a separate client that knows how to request qemu:dirty-bitmap:FOO context (and I don't know of any open source clients other than qemu that currently support that). Found by code inspection.

Steps to Reproduce:
1. The bug only occurs if a dirty bitmap has a 4G or larger contiguous status (all clean or all dirty), where the client requests status for 4G-1 bytes without the NBD_CMD_FLAG_REQ_ONE limitation.
2.
3.

Actual results:
qemu as server will split the 4G range into 3 extents: [0 to 4G-bitmap_granularity), 0-length extent, [4G-bitmap_granularity onwards)

Expected results:
The server should not send the 0-length extent.

Additional info:

Comment 2 Eric Blake 2018-09-14 18:21:49 UTC
An easy workaround for anyone writing a custom NBD client to experiment with incremental block backup via pull over NBD: clamp your NBD_CMD_BLOCK_STATUS requests to something like 2 or 3G per request, rather than trying to get a maximum 4G-1 bytes at a time.

Comment 7 aihua liang 2019-05-24 10:49:55 UTC
Have not get the feedback before the deadline's coming. 
Based on my test result, set this bug's status to "Verified". 
If any feedback come later and new verification needed, i'll append the result then.

Comment 8 Eric Blake 2019-05-24 12:45:09 UTC
(In reply to Eric Blake from comment #0)
> Description of problem:
> Vladimir discovered that qemu as NBD server for the qemu:dirty-bitmap:FOO
> context can send invalid extents of length 0, when it splits large (4G)
> regions of the bitmap to a client that permits the server to send more than
> one extent:
> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html
> 
> Version-Release number of selected component (if applicable):
> qemu-kvm-rhev-2.12.0-15.el7
> 
> How reproducible:
> Difficult. qemu as client can't trigger it; it requires a separate client
> that knows how to request qemu:dirty-bitmap:FOO context (and I don't know of
> any open source clients other than qemu that currently support that). Found
> by code inspection.

Rich's libnbd (so far unreleased) can now do that:
https://github.com/libguestfs/libnbd

Comment 10 Richard W.M. Jones 2019-05-24 13:12:36 UTC
Typical test program from Python 3:

-----------------
#!/usr/bin/python3

import nbd

# Change this to the name of the dirty bitmap:
name = "qemu:dirty-bitmap:bitmap0"

h = nbd.NBD ()
h.request_meta_context (name)
# Change these parameters to point to the qemu NBD server:
h.connect_tcp ("localhost", "nbd")

entries = []
def f (data, metacontext, offset, e):
    global entries
    if metacontext != name:
        return
    entries = e

size = h.get_size ()
h.block_status (size, 0, 0, 0, f)
h.shutdown ()

print ("%r" % entries)
-----------------

We don't have a release of libnbd yet, but if you compile from source you can
run this by saving that as "bitmap.py" and running:

  /path/to/libnbd/run ./bitmap.py

Comment 11 Eric Blake 2019-05-24 13:47:22 UTC
(In reply to Eric Blake from comment #8)

> > How reproducible:
> > Difficult. qemu as client can't trigger it; it requires a separate client
> > that knows how to request qemu:dirty-bitmap:FOO context (and I don't know of
> > any open source clients other than qemu that currently support that). Found
> > by code inspection.
> 
> Rich's libnbd (so far unreleased) can now do that:
> https://github.com/libguestfs/libnbd

$ # Create an image and give it an empty persistent dirty bitmap
$ qemu-img create -f qcow2 scratch.qcow2 5g
$ cat <<\EOF | qemu-kvm -nodefaults -nographic -qmp stdio
{'execute':'qmp_capabilities'}
{'execute':'blockdev-add','arguments':{'node-name':'n','driver':'qcow2','file':{'driver':'file','filename':'scratch.qcow2'}}}
{'execute':'block-dirty-bitmap-add','arguments':{'node':'n','name':'bitmap0','persistent':true}}
{'execute':'quit'}
EOF
$ # Serve that image
$ qemu-nbd -t -f qcow2 -B bitmap0 scratch.qcow2 &
$ # And use libnbd to provoke the problem
$ ./run sh/nbdsh
nbd> h.request_meta_context("qemu:dirty-bitmap:bitmap0")
nbd> h.connect_tcp ("localhost", "10809")
nbd> def f (_, m, o, e):
...   print (m, o, e)
...
nbd> h.block_status(4 * 1024 * 1024 * 1024 - 1, 512, 0, None, f)
qemu:dirty-bitmap:bitmap0 512 [4294901760, 0, 1073806848, 0]

With broken qemu, the result is different.  If you are testing upstream qemu 3.0, rather than downstream where 'qemu-nbd -B' was backported, the sequence for exposing an empty bitmap changes to this:
{'execute':'qmp_capabilities'}
{'execute':'blockdev-add','arguments':{'node-name':'n','driver':'qcow2','file':{'driver':'file','filename':'scratch.qcow2'}}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
{'execute':'nbd-server-add','arguments':{'device':'n','name':''}}
{'execute':'block-dirty-bitmap-add','arguments':{'node':'n','name':'bitmap0'}}
{'execute':'x-block-dirty-bitmap-disable','arguments':{'node':'n','name':'bitmap0'}}
{'execute':'x-nbd-server-add-bitmap','arguments':{'name':'','bitmap':'bitmap0'}}

and the nbdsh output is:
nbd> h.block_status(4 * 1024 * 1024 * 1024 - 1, 512, 0, None, f)
qemu:dirty-bitmap:b0 512 [4294901760, 0, 0, 1, 1073806848, 0]

where that 0,1 pair in the middle is bogus.

Comment 12 Eric Blake 2019-05-28 02:04:23 UTC
libnbd 0.2 may change the signature of h.block_status() to rearrange flags; see
https://www.redhat.com/archives/libguestfs/2019-May/msg00207.html

Comment 13 Richard W.M. Jones 2019-05-28 09:15:47 UTC
... and it does change request_meta_context -> add_meta_context:

https://github.com/libguestfs/libnbd/commit/b32b8172079a6626560836142149154e3be5bdab

Comment 17 errata-xmlrpc 2019-08-22 09:18:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553