Bug 1464837

Summary: qemu records an undocumented field "=keyvalue-pairs" in overlay image of a RBD volume after snapshot
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Han Han <hhan>
Component: qemu-kvmAssignee: Markus Armbruster <armbru>
qemu-kvm sub component: Ceph QA Contact: Tingting Mao <timao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: aliang, armbru, chayang, coli, dyuan, jherrman, juzhang, knoel, pkrempa, virt-maint, xfu, xuzhang
Version: 8.0Keywords: Triaged
Target Milestone: rc   
Target Release: 8.1   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-26 06:59:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473046    

Description Han Han 2017-06-26 02:44:30 UTC
Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-3.2.0-15.el7a.x86_64
qemu-kvm-rhev-2.9.0-14.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running VM with rbd backend
2. Create a snapshot with qemu
# virsh qemu-monitor-command V '{"execute":"blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-disk0", "snapshot
-file":"/tmp/rbd.bak","format":"qcow2"}}'                                                                                                         
{"return":{},"id":"libvirt-16"}

3. Check the snapshot file
# qemu-img info /tmp/rbd.bak 
image: /tmp/rbd.bak
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", \"10.73.75.52\"]"}}
backing file format: raw
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

As I konw "=keyvalue-pairs" is not defined in qapi/block-core.json and qapi-schema.json. So it is not standard rbd json format.
The back file json format using raw driver and doesn't take standard rbd json format like:
'{"file": {"pool": "rbd", "image": "V", "driver": "rbd", "server":[{"host":"XX.XX.XX.XX","port":"6789"}]}}'


Actual results:
As above

Expected results:
The back file of rbd should use standard rbd format, without "=keyvalue-pairs".

Additional info:

Comment 2 Han Han 2017-06-26 06:48:21 UTC
*** Bug 1464838 has been marked as a duplicate of this bug. ***

Comment 5 Peter Krempa 2017-06-27 15:02:45 UTC
The impact on libvirt is following:

If you start a VM with the following disk definition:
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <source protocol='rbd' name='rbd/V'>
        <host name='10.73.75.52' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

This translates to the following command line options:
-drive file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk

Then if you create an external snapshot on top of that image, qemu records the following backing store string:
json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", \"10.73.75.52\"]"}}

Since it contains non-documented fields, the libvirt backing store parser is not able to parse it and thus certain block jobs which require refresh of the backing store data won't work properly.

After this any other block job (except for snapshots, since I've already improved the backing store tracking so that re-detection is not necessary) won't work properly.

The problem here is that the string recorded by qemu changed to something that isn't the old string which used to be and neither isn't the proper json: pseudo-protocol string.

Comment 9 Suqin Huang 2017-07-03 01:39:39 UTC
no cmd blockdev-snapshot-sync in qemu-kvm

qemu-kvm didn't support storage vm migration

Comment 10 Jeff Cody 2018-06-30 02:25:41 UTC
(In reply to Peter Krempa from comment #5)
> The impact on libvirt is following:
> 
> If you start a VM with the following disk definition:
>     <disk type='network' device='disk'>
>       <driver name='qemu' type='raw'/>
>       <source protocol='rbd' name='rbd/V'>
>         <host name='10.73.75.52' port='6789'/>
>       </source>
>       <target dev='vda' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </disk>
> 
> This translates to the following command line options:
> -drive
> file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none,
> id=drive-virtio-disk0 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,
> id=virtio-disk
> 
> Then if you create an external snapshot on top of that image, qemu records
> the following backing store string:
> json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver":
> "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\",
> \"10.73.75.52\"]"}}
> 
> Since it contains non-documented fields, the libvirt backing store parser is
> not able to parse it and thus certain block jobs which require refresh of
> the backing store data won't work properly.
> 
> After this any other block job (except for snapshots, since I've already
> improved the backing store tracking so that re-detection is not necessary)
> won't work properly.
> 
> The problem here is that the string recorded by qemu changed to something
> that isn't the old string which used to be and neither isn't the proper
> json: pseudo-protocol string.

After the latest rbd configuration series [1] that added configuration options for rbd blockdev-add, is this still a relevant issue for libvirt?

Comment 11 Peter Krempa 2018-07-16 11:43:19 UTC
With the use of -blockdev I presume this should not be a problem any more but I don't plan to add support to the JSON backing store parser for the above format to libvirt so any image created with this format will be broken as libvirt can't parse back the IP adress of the mon host.

Comment 12 Jeff Cody 2018-08-15 16:43:36 UTC
Some upstream discussion:  https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html

I think it is clear we need to filter the json output for the key value pairs.  We also should likely deprecate that whole part of the API for rbd in QEMU, once we support enough rbd options (and relegate the rest to the rbd conf file).

Comment 13 Peter Krempa 2018-08-29 14:56:35 UTC
That will be probably the best solution. Note that qemu should report the mon_hosts as normal hosts described by the QAPI schema. The problem with the libvirt parser is that it is not able to get the hostnames. We don't specify the keywords on the commandline nor parse them.

Comment 14 Jeff Cody 2018-09-06 17:52:44 UTC
(In reply to Peter Krempa from comment #13)
> That will be probably the best solution. Note that qemu should report the
> mon_hosts as normal hosts described by the QAPI schema. The problem with the
> libvirt parser is that it is not able to get the hostnames. We don't specify
> the keywords on the commandline nor parse them.

This is the solution I have been on, filtering out the json output.  However, in practice this is fairly ugly in QEMU, and I'm not sure if the patches will be accepted upstream.

If we look at the other approach, is it feasible for libvirt to filter out the offending "=keyvalue-pairs"?

Comment 15 Peter Krempa 2018-09-07 06:25:11 UTC
The problem in libvirt is not that =keyvalue-pairs is present, but that the server address is stored in the =keyvalue-pairs rather than the officially documented array of 'server' objects.

When we parse such a 'json:' pseudo-protocol string we dont fill in our server field. When using -blockdev this will translate either to failure or to incomplete specification of the host.

=keyvalue-pairs does not necessarily need to be removed but since there are fields for the 'server'(mon-host) and 'auth-client-required'(auth_supported) fields they should be reported in the documented fields.

Comment 21 Markus Armbruster 2018-10-10 05:23:33 UTC
I just inherited this bug, and I'm not sure I fully understand how exactly it impacts libvirt.  Peter, could you give me an example of how libvirt's use of the rbd driver falls apart due to =keyvalue-pairs?  I'm looking for an updated reproducer with actual and results that break libvirt and proposed results that (could be made to) work.

Comment 22 Peter Krempa 2018-10-10 07:36:50 UTC
(In reply to Markus Armbruster from comment #21)
> I just inherited this bug, and I'm not sure I fully understand how exactly
> it impacts libvirt.  Peter, could you give me an example of how libvirt's
> use of the rbd driver falls apart due to =keyvalue-pairs? 

Note: The example below is rather old and I did not re-try this after the recent changes to the RBD backend which made it possible to use it with -blockdev properly. After -blockdev support is enabled we will stop using the -drive option as seen below and switch to declare it properly via 'BlockdevOptionsRbd'

Well the actual problem is not the =keyvalue-pairs field itself which we thoroughly ignore, but the fact that useful information are encoded _only_ using that field.

In an example above we start a guest on RBD using the following XML:
f you start a VM with the following disk definition:
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <source protocol='rbd' name='rbd/V'>
        <host name='10.73.75.52' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

This translates to the following command line options:
-drive file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk

Then if you create an external snapshot on top of that image, qemu records the following backing store string:
json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", \"10.73.75.52\"]"}}

As you can see, the above encodes the "mon_host" part in keyvalue pairs rather than the 'server' field which is documented and libvirt uses it to get the host from.

If this does not happen once -blockdev with 'BlockdevOptionsRbd' is used this BZ can be regarded as fixed regardless if =keyvalue-pairs is still present. We just need the data to be provided in the documented fields. Sice we can't fix retroactively broken backing store definitions there's not much we can do further.

Comment 23 Markus Armbruster 2018-10-10 12:05:52 UTC
Quick experiment with *upstream* QEMU:

0. Hack up the rbd driver to not actually connect (since I don't have a working Ceph setup)

1. Run QEMU with an rbd block device and a QMP monitor

   $ qemu-system-x86_64 -display none -S -blockdev node-name=disk,driver=rbd,pool=rbd,image=convert.raw,server.0.host=10.73.196.181,server.0.port=6789,user=admin,auth-client-required.0=cephx -qmp stdio

2. Create an external snapshot

   {"execute": "qmp_capabilities"}
   {"execute":"blockdev-snapshot-sync", "arguments": {"node-name": "rbd", "snapshot-file":"rbd-snap.qcow2", "format":"qcow2", "snapshot-node-name": "rbd-snap"}}

3. Examine the external snapshot

   $ upstream-qemu-img info rbd-snap.qcow2

Actual output of step 3:

    image: rbd-snap.qcow2
    file format: qcow2
    virtual size: 42M (44040192 bytes)
    disk size: 196K
    cluster_size: 65536
    backing file: json:{"pool": "rbd", "image": "convert.raw", "server.0.host": "10.73.196.181", "driver": "rbd", "server.0.port": "6789", "user": "admin", "auth-client-required.0": "cephx"}
    backing file format: rbd
    Format specific information:
	compat: 1.1
	lazy refcounts: false
	refcount bits: 16
	corrupt: false

Same result with a locally compiled qemu-kvm-rhev-2.12.0-18.el7 similarly patched.

Comment 24 Peter Krempa 2018-10-10 12:55:58 UTC
Awesome, in such case I don't think any further steps are required. We'll be using a very similar command line when using -blockdev (not flattened into the dotted syntax, but rather direct json) so if that produces documented fields for server it's sufficient for our case.

Comment 26 Markus Armbruster 2018-10-10 17:05:04 UTC
One more thing: "=keyvalue-pairs" will go away when we get rid of the deprecated key=value:... part of -drive file=rbd:...:key=value:...  Expected to happen upstream in 3.2.  We can make it happen downstream as soon as libvirt is ready for it.

Comment 31 Peter Krempa 2019-02-15 08:40:25 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1466177 - libvirt cannot parse back file json format of multi-servers with index

Comment 32 Ademar Reis 2019-02-15 16:44:10 UTC
(In reply to Markus Armbruster from comment #26)
> One more thing: "=keyvalue-pairs" will go away when we get rid of the
> deprecated key=value:... part of -drive file=rbd:...:key=value:...  Expected
> to happen upstream in 3.2.  We can make it happen downstream as soon as
> libvirt is ready for it.

So if I understood it correctly, "libvirt being ready for it" is being tracked in bug 1466177.

Comment 33 John Ferlan 2019-05-02 20:46:41 UTC
Move to next major release, from depends on bug.

Comment 35 Ademar Reis 2020-02-05 22:44:12 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 36 Peter Krempa 2020-02-19 12:12:14 UTC
With the introduction of blockdev-create and libvirt's use of it libvirtd now fully controls what's written into the overlays while doing snapshots. This means that no undocumented or weird values are written into the overlay.

So from libvirt's point of view this BZ is now fixed.

Comment 37 Markus Armbruster 2020-02-20 10:14:08 UTC
The reported bug is about the odd "=keyvalue-pairs" in an external snapshot of an rbd block backend configured with -drive's pseudo-filename file=rbd:POOL/DEV:KEY=VALUE:...

The odd "=keyvalue-pairs" was gone in comment#23.  The "json:" pseudo-filename was cleaned up further upstream commit 97e2f021f8 "block: Generically refresh runtime options", v4.0.0:

    With this patch, some superfluous information (that should never have
    been there) may be removed from some JSON filenames, [...]

We rebased to v4.0.0 for RHEL 8.1.

Additional concerns were raised about libvirt: recording the KEY=VALUE:... part in "=keyvalue-pairs" or not at all defeats libvirt's backing store parser, "and thus certain block jobs which require refresh of the backing store data won't work properly" (comment#5).

Libvirt does not use -drive anymore.  Possible improvements to deal with snapshots from its -drive era are tracked in bug 1466177.

I believe there is nothing more left to for this bug other than having QE verify "=keyvalue-pairs" is indeed gone.  Setting status to MODIFIED.

Comment 38 Tingting Mao 2020-02-26 03:44:55 UTC
Tried in latest rhel7


Tested with:
qemu-kvm-rhev-2.12.0-44.el7
kernel-3.10.0-1125.el7.x86_64


Steps:
1. Boot a guest from the OS disk over rbd.
# /usr/libexec/qemu-kvm \
        -name 'guest-rhel7' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=rbd:rbd/rhel78-64-virtio-scsi.qcow2 \
        -device virtio-blk-pci,id=virtio_blk_pci1,drive=drive_image2,bus=pcie.0,addr=06 \
        -vnc :1 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \
	-device pcie-root-port,id=pcie.0-root-port-8,slot=3,chassis=3,addr=0x3,bus=pcie.0 \

2. Make snapshot via QMP
{"execute": "qmp_capabilities"}
{"return": {}}

{"execute":"blockdev-snapshot-sync", "arguments": { "device": "drive_image2", "snapshot-file":"/tmp/rbd.bak","format":"qcow2"}}
{"return": {}}

3. Check the info of snapshot file.
# qemu-img info /tmp/rbd.bak --backing-chain
image: /tmp/rbd.bak
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: unavailable
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# qemu-img info /tmp/rbd.bak
image: /tmp/rbd.bak
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 39 Tingting Mao 2020-02-26 05:53:21 UTC
Tried in latest rhel8.2-av, there is no the issue.


Tested with:
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc
kernel-4.18.0-175.el8.x86_64


Steps:
1. Boot up a guest from OS disk over RBD.
# /usr/libexec/qemu-kvm \
    -name 'guest-rhel7.7' \
    -machine q35 \
    -nodefaults \
    -vga qxl \
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
    -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
    -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
    -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
    -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
    -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
    -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.2 \
    -blockdev driver=rbd,cache.direct=on,cache.no-flush=off,node-name=my_file1,pool=rbd,image=rhel78-64-virtio-scsi.qcow2 \
    -blockdev driver=qcow2,node-name=my1,file=my_file1 \
    -device scsi-hd,drive=my1 \
    -vnc :0 \
    -m 8192 \
    -smp 4 \
    -netdev tap,id=hostnet0,vhost=on \
    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:56:00:00:00:07,bus=pci.3,addr=0x0 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/home/qmp-sock2-mirror,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -enable-kvm \
    -monitor stdio

2. Create snapshot file via QMP.
{'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn2','size':21474836480},'job-id':'job1'}}
{'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn2','filename':'/root/sn2'}}

{'execute':'blockdev-create','arguments':{'options': {'driver':'qcow2','file':'drive_sn2','size':21474836480,'backing-file':'json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}','backing-fmt':'qcow2'},'job-id':'job2'}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn2','file':'drive_sn2','backing':null}}
{'execute':'job-dismiss','arguments':{'id':'job1'}}
{'execute':'job-dismiss','arguments':{'id':'job2'}}

{"execute":"blockdev-snapshot","arguments":{"node":"my1","overlay":"sn2"}}


3. Check the info of the snapshot file.
# qemu-img info /root/sn2 
image: /root/sn2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 196 KiB
cluster_size: 65536
backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# qemu-img info /root/sn2 --backing-chain 
image: /root/sn2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 196 KiB
cluster_size: 65536
backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: unavailable
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false