Bug 1464837
Summary: | qemu records an undocumented field "=keyvalue-pairs" in overlay image of a RBD volume after snapshot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Han Han <hhan> |
Component: | qemu-kvm | Assignee: | Markus Armbruster <armbru> |
qemu-kvm sub component: | Ceph | QA Contact: | Tingting Mao <timao> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aliang, armbru, chayang, coli, dyuan, jherrman, juzhang, knoel, pkrempa, virt-maint, xfu, xuzhang |
Version: | 8.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 8.1 | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-26 06:59:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1473046 |
Description
Han Han
2017-06-26 02:44:30 UTC
*** Bug 1464838 has been marked as a duplicate of this bug. *** The impact on libvirt is following: If you start a VM with the following disk definition: <disk type='network' device='disk'> <driver name='qemu' type='raw'/> <source protocol='rbd' name='rbd/V'> <host name='10.73.75.52' port='6789'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> This translates to the following command line options: -drive file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk Then if you create an external snapshot on top of that image, qemu records the following backing store string: json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", \"10.73.75.52\"]"}} Since it contains non-documented fields, the libvirt backing store parser is not able to parse it and thus certain block jobs which require refresh of the backing store data won't work properly. After this any other block job (except for snapshots, since I've already improved the backing store tracking so that re-detection is not necessary) won't work properly. The problem here is that the string recorded by qemu changed to something that isn't the old string which used to be and neither isn't the proper json: pseudo-protocol string. no cmd blockdev-snapshot-sync in qemu-kvm qemu-kvm didn't support storage vm migration (In reply to Peter Krempa from comment #5) > The impact on libvirt is following: > > If you start a VM with the following disk definition: > <disk type='network' device='disk'> > <driver name='qemu' type='raw'/> > <source protocol='rbd' name='rbd/V'> > <host name='10.73.75.52' port='6789'/> > </source> > <target dev='vda' bus='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </disk> > > This translates to the following command line options: > -drive > file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none, > id=drive-virtio-disk0 -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0, > id=virtio-disk > > Then if you create an external snapshot on top of that image, qemu records > the following backing store string: > json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": > "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", > \"10.73.75.52\"]"}} > > Since it contains non-documented fields, the libvirt backing store parser is > not able to parse it and thus certain block jobs which require refresh of > the backing store data won't work properly. > > After this any other block job (except for snapshots, since I've already > improved the backing store tracking so that re-detection is not necessary) > won't work properly. > > The problem here is that the string recorded by qemu changed to something > that isn't the old string which used to be and neither isn't the proper > json: pseudo-protocol string. After the latest rbd configuration series [1] that added configuration options for rbd blockdev-add, is this still a relevant issue for libvirt? With the use of -blockdev I presume this should not be a problem any more but I don't plan to add support to the JSON backing store parser for the above format to libvirt so any image created with this format will be broken as libvirt can't parse back the IP adress of the mon host. Some upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02676.html I think it is clear we need to filter the json output for the key value pairs. We also should likely deprecate that whole part of the API for rbd in QEMU, once we support enough rbd options (and relegate the rest to the rbd conf file). That will be probably the best solution. Note that qemu should report the mon_hosts as normal hosts described by the QAPI schema. The problem with the libvirt parser is that it is not able to get the hostnames. We don't specify the keywords on the commandline nor parse them. (In reply to Peter Krempa from comment #13) > That will be probably the best solution. Note that qemu should report the > mon_hosts as normal hosts described by the QAPI schema. The problem with the > libvirt parser is that it is not able to get the hostnames. We don't specify > the keywords on the commandline nor parse them. This is the solution I have been on, filtering out the json output. However, in practice this is fairly ugly in QEMU, and I'm not sure if the patches will be accepted upstream. If we look at the other approach, is it feasible for libvirt to filter out the offending "=keyvalue-pairs"? The problem in libvirt is not that =keyvalue-pairs is present, but that the server address is stored in the =keyvalue-pairs rather than the officially documented array of 'server' objects. When we parse such a 'json:' pseudo-protocol string we dont fill in our server field. When using -blockdev this will translate either to failure or to incomplete specification of the host. =keyvalue-pairs does not necessarily need to be removed but since there are fields for the 'server'(mon-host) and 'auth-client-required'(auth_supported) fields they should be reported in the documented fields. I just inherited this bug, and I'm not sure I fully understand how exactly it impacts libvirt. Peter, could you give me an example of how libvirt's use of the rbd driver falls apart due to =keyvalue-pairs? I'm looking for an updated reproducer with actual and results that break libvirt and proposed results that (could be made to) work. (In reply to Markus Armbruster from comment #21) > I just inherited this bug, and I'm not sure I fully understand how exactly > it impacts libvirt. Peter, could you give me an example of how libvirt's > use of the rbd driver falls apart due to =keyvalue-pairs? Note: The example below is rather old and I did not re-try this after the recent changes to the RBD backend which made it possible to use it with -blockdev properly. After -blockdev support is enabled we will stop using the -drive option as seen below and switch to declare it properly via 'BlockdevOptionsRbd' Well the actual problem is not the =keyvalue-pairs field itself which we thoroughly ignore, but the fact that useful information are encoded _only_ using that field. In an example above we start a guest on RBD using the following XML: f you start a VM with the following disk definition: <disk type='network' device='disk'> <driver name='qemu' type='raw'/> <source protocol='rbd' name='rbd/V'> <host name='10.73.75.52' port='6789'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> This translates to the following command line options: -drive file=rbd:rbd/V:auth_supported=none:mon_host=10.73.75.52,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk Then if you create an external snapshot on top of that image, qemu records the following backing store string: json:{"driver": "raw", "file": {"pool": "rbd", "image": "V", "driver": "rbd", "=keyvalue-pairs": "[\"auth_supported\", \"none\", \"mon_host\", \"10.73.75.52\"]"}} As you can see, the above encodes the "mon_host" part in keyvalue pairs rather than the 'server' field which is documented and libvirt uses it to get the host from. If this does not happen once -blockdev with 'BlockdevOptionsRbd' is used this BZ can be regarded as fixed regardless if =keyvalue-pairs is still present. We just need the data to be provided in the documented fields. Sice we can't fix retroactively broken backing store definitions there's not much we can do further. Quick experiment with *upstream* QEMU: 0. Hack up the rbd driver to not actually connect (since I don't have a working Ceph setup) 1. Run QEMU with an rbd block device and a QMP monitor $ qemu-system-x86_64 -display none -S -blockdev node-name=disk,driver=rbd,pool=rbd,image=convert.raw,server.0.host=10.73.196.181,server.0.port=6789,user=admin,auth-client-required.0=cephx -qmp stdio 2. Create an external snapshot {"execute": "qmp_capabilities"} {"execute":"blockdev-snapshot-sync", "arguments": {"node-name": "rbd", "snapshot-file":"rbd-snap.qcow2", "format":"qcow2", "snapshot-node-name": "rbd-snap"}} 3. Examine the external snapshot $ upstream-qemu-img info rbd-snap.qcow2 Actual output of step 3: image: rbd-snap.qcow2 file format: qcow2 virtual size: 42M (44040192 bytes) disk size: 196K cluster_size: 65536 backing file: json:{"pool": "rbd", "image": "convert.raw", "server.0.host": "10.73.196.181", "driver": "rbd", "server.0.port": "6789", "user": "admin", "auth-client-required.0": "cephx"} backing file format: rbd Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Same result with a locally compiled qemu-kvm-rhev-2.12.0-18.el7 similarly patched. Awesome, in such case I don't think any further steps are required. We'll be using a very similar command line when using -blockdev (not flattened into the dotted syntax, but rather direct json) so if that produces documented fields for server it's sufficient for our case. One more thing: "=keyvalue-pairs" will go away when we get rid of the deprecated key=value:... part of -drive file=rbd:...:key=value:... Expected to happen upstream in 3.2. We can make it happen downstream as soon as libvirt is ready for it. https://bugzilla.redhat.com/show_bug.cgi?id=1466177 - libvirt cannot parse back file json format of multi-servers with index (In reply to Markus Armbruster from comment #26) > One more thing: "=keyvalue-pairs" will go away when we get rid of the > deprecated key=value:... part of -drive file=rbd:...:key=value:... Expected > to happen upstream in 3.2. We can make it happen downstream as soon as > libvirt is ready for it. So if I understood it correctly, "libvirt being ready for it" is being tracked in bug 1466177. Move to next major release, from depends on bug. QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks With the introduction of blockdev-create and libvirt's use of it libvirtd now fully controls what's written into the overlays while doing snapshots. This means that no undocumented or weird values are written into the overlay. So from libvirt's point of view this BZ is now fixed. The reported bug is about the odd "=keyvalue-pairs" in an external snapshot of an rbd block backend configured with -drive's pseudo-filename file=rbd:POOL/DEV:KEY=VALUE:... The odd "=keyvalue-pairs" was gone in comment#23. The "json:" pseudo-filename was cleaned up further upstream commit 97e2f021f8 "block: Generically refresh runtime options", v4.0.0: With this patch, some superfluous information (that should never have been there) may be removed from some JSON filenames, [...] We rebased to v4.0.0 for RHEL 8.1. Additional concerns were raised about libvirt: recording the KEY=VALUE:... part in "=keyvalue-pairs" or not at all defeats libvirt's backing store parser, "and thus certain block jobs which require refresh of the backing store data won't work properly" (comment#5). Libvirt does not use -drive anymore. Possible improvements to deal with snapshots from its -drive era are tracked in bug 1466177. I believe there is nothing more left to for this bug other than having QE verify "=keyvalue-pairs" is indeed gone. Setting status to MODIFIED. Tried in latest rhel7 Tested with: qemu-kvm-rhev-2.12.0-44.el7 kernel-3.10.0-1125.el7.x86_64 Steps: 1. Boot a guest from the OS disk over rbd. # /usr/libexec/qemu-kvm \ -name 'guest-rhel7' \ -machine q35 \ -nodefaults \ -vga qxl \ -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=rbd:rbd/rhel78-64-virtio-scsi.qcow2 \ -device virtio-blk-pci,id=virtio_blk_pci1,drive=drive_image2,bus=pcie.0,addr=06 \ -vnc :1 \ -monitor stdio \ -m 8192 \ -smp 8 \ -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 \ -netdev tap,id=idxgXAlm \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -device pcie-root-port,id=pcie.0-root-port-8,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ 2. Make snapshot via QMP {"execute": "qmp_capabilities"} {"return": {}} {"execute":"blockdev-snapshot-sync", "arguments": { "device": "drive_image2", "snapshot-file":"/tmp/rbd.bak","format":"qcow2"}} {"return": {}} 3. Check the info of snapshot file. # qemu-img info /tmp/rbd.bak --backing-chain image: /tmp/rbd.bak file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 196K cluster_size: 65536 backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: unavailable cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img info /tmp/rbd.bak image: /tmp/rbd.bak file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 196K cluster_size: 65536 backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false Tried in latest rhel8.2-av, there is no the issue. Tested with: qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc kernel-4.18.0-175.el8.x86_64 Steps: 1. Boot up a guest from OS disk over RBD. # /usr/libexec/qemu-kvm \ -name 'guest-rhel7.7' \ -machine q35 \ -nodefaults \ -vga qxl \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.2 \ -blockdev driver=rbd,cache.direct=on,cache.no-flush=off,node-name=my_file1,pool=rbd,image=rhel78-64-virtio-scsi.qcow2 \ -blockdev driver=qcow2,node-name=my1,file=my_file1 \ -device scsi-hd,drive=my1 \ -vnc :0 \ -m 8192 \ -smp 4 \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:56:00:00:00:07,bus=pci.3,addr=0x0 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/home/qmp-sock2-mirror,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -enable-kvm \ -monitor stdio 2. Create snapshot file via QMP. {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn2','size':21474836480},'job-id':'job1'}} {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn2','filename':'/root/sn2'}} {'execute':'blockdev-create','arguments':{'options': {'driver':'qcow2','file':'drive_sn2','size':21474836480,'backing-file':'json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}}','backing-fmt':'qcow2'},'job-id':'job2'}} {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn2','file':'drive_sn2','backing':null}} {'execute':'job-dismiss','arguments':{'id':'job1'}} {'execute':'job-dismiss','arguments':{'id':'job2'}} {"execute":"blockdev-snapshot","arguments":{"node":"my1","overlay":"sn2"}} 3. Check the info of the snapshot file. # qemu-img info /root/sn2 image: /root/sn2 file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: 196 KiB cluster_size: 65536 backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img info /root/sn2 --backing-chain image: /root/sn2 file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: 196 KiB cluster_size: 65536 backing file: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} backing file format: qcow2 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: json:{"driver": "qcow2", "file": {"pool": "rbd", "image": "rhel78-64-virtio-scsi.qcow2", "driver": "rbd"}} file format: qcow2 virtual size: 20 GiB (21474836480 bytes) disk size: unavailable cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false |