Bug 1542445
Summary: | qemu-img commit snapshot images whose backing file is specified with JSON fails over GlusterFS. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Longxiang Lyu <lolyu> |
Component: | qemu-kvm | Assignee: | Stefano Garzarella <sgarzare> |
Status: | CLOSED ERRATA | QA Contact: | Tingting Mao <timao> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | --- | CC: | aliang, berrange, chayang, coli, ddepaula, juzhang, knoel, ngu, qzhang, rbalakri, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-06 07:11:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Longxiang Lyu
2018-02-06 11:24:35 UTC
snapshot-check succeeds. a. # qemu-img check --object secret,id=sec0,data=redhat --object secret,id=sec1,data=kvmautotest --image-opts driver=qcow2,encrypt.key-secret=sec1,backing.encrypt.key-secret=sec0,file.driver=gluster,file.volume=gv0,file.path=sn1.qcow2,file.server.0.type=tcp,file.server.0.host=10.73.199.197,file.server.0.port=24007,file.server.1.type=tcp,file.server.1.host=10.73.199.200,file.server.1.port=24007 No errors were found on the image. Image end offset: 2359296 [2018-02-06 11:25:23.966430] E [MSGID: 108006] [afr-common.c:5090:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. b. # qemu-img check --object secret,id=sec0,data=redhat --object secret,id=sec1,data=kvmautotest 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"gv0","path":"sn1.qcow2","server":[{"type":"tcp","host":"10.73.199.197","port":"24007"}]},"encrypt.format":"luks","encrypt.key-secret":"sec1","backing.encrypt.key-secret":"sec0"}' No errors were found on the image. Image end offset: 2359296 [2018-02-06 11:26:39.910808] E [MSGID: 108006] [afr-common.c:5090:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. > qemu-img: invalid URI > Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] There error message here is useless so I improved it upstream https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg01208.html so you can see it is complaining about receiving a JSON format filename: qemu-img: invalid URI json:{"server.0.host": "10.73.199.197", "driver": "gluster", "path": "luks.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "gv0"} This is when qemu-img tries to reopen the base image after the commit task is complete. Test with the following step: 1. create a plain qcow2 on gluster # qemu-img create -f qcow2 gluster://10.73.199.197/gv0/base.qcow2 10G 2. create a snapshot of base image on gluster # qemu-img create -f qcow2 -b gluster://10.73.199.197/gv0/base.qcow2 gluster://10.73.199.197/gv0/sn1.qcow2 3. commit the snapshot. a. # qemu-img commit 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"gv0","path":"sn1.qcow2","server":[{"type":"tcp","host":"10.73.199.197","port":"24007"}]}}' -p (100.00/100%) [2018-02-06 14:07:21.838010] E [MSGID: 108006] [afr-common.c:5090:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Image committed. b. # qemu-img commit --image-opts driver=qcow2,file.driver=gluster,file.volume=gv0,file.path=sn1.qcow2,file.server.0.type=tcp,file.server.0.host=10.73.199.197,file.server.0.port=24007,file.server.1.type=tcp,file.server.1.host=10.73.199.200,file.server.1.port=24007 -p (100.00/100%) [2018-02-06 14:12:39.024912] E [MSGID: 108006] [afr-common.c:5090:__afr_handle_child_down_event] 0-gv0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Image committed. There is a typo in the creation of luks.qcow2: the secret is "redhat" instead of "backing". Still hit this issue in 'qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2'. Tested packages: qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2 kernel-4.18.0-67.el8 gluster server: glusterfs-server-3.12.2-43.el7rhgs.x86_64 # gluster volume info Volume Name: vol0 Type: Replicate Volume ID: f506da21-23a2-410c-a4b7-41efedebd4dc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dhcp-8-206.nay.redhat.com:/data/brick1/gv0 Brick2: gluster-virt-qe-01.lab.eng.pek2.redhat.com:/data/brick1/gv0 Options Reconfigured: performance.client-io-threads: off nfs.disable: on Steps in client: 1. Create snapshot(luks-inside-qcow2) based on the backing file with libgfapi # qemu-img create -f qcow2 --object secret,id=sec0,data=snapshot -b 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"vol0","path":"base.qcow2","server":[{"type":"tcp","host":"10.73.196.181","port":"24007"}]}}' -o encrypt.format=luks,encrypt.key-secret=sec0 gluster://10.73.196.181/vol0/sn_luks.qcow2 2. Commit the snapshot file with json format and --image-opts # qemu-img commit --object secret,id=sec0,data=snapshot 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"vol0","path":"sn_luks.qcow2","server":[{"type":"tcp","host":"10.73.196.181","port":"24007"}]}, "encrypt.format": "luks", "encrypt.key-secret": "sec0"}' -p (0.00/100%) [2019-02-27 06:43:58.275184] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. qemu-img: invalid URI json:{"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"} Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] # qemu-img commit --object secret,id=sec0,data=snapshot --image-opts file.driver=gluster,file.volume=vol0,file.path=sn_luks.qcow2,file.server.0.type=inet,file.server.0.host=10.73.196.181,file.server.0.port=24007,driver=qcow2,encrypt.format=luks,encrypt.key-secret=sec0 -p (0.00/100%) [2019-02-27 06:45:17.376226] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. qemu-img: invalid URI json:{"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"} Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] For qcow2(Not luks-inside-qcow2), hitting the issue when committing to a specified backing file. # qemu-img info 'json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "sn2.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}}' --backing-chain [2019-03-01 08:00:52.022015] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-01 08:00:54.021365] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-01 08:00:56.025609] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. image: json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "sn2.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}} file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 574M cluster_size: 65536 backing file: json:{"driver":"qcow2","file":{"driver":"gluster","volume":"vol0","path":"sn1.qcow2","server":[{"type":"tcp","host":"10.73.196.181","port":"24007"}]}} Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "sn1.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}} file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 193K cluster_size: 65536 backing file: json:{"driver":"qcow2","file":{"driver":"gluster","volume":"vol0","path":"base.qcow2","server":[{"type":"tcp","host":"10.73.196.181","port":"24007"}]}} Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false image: json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}} file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 10.0G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img commit -b 'json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}}' 'json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "sn2.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}}' -p (0.00/100%) [2019-03-01 07:51:33.749543] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. qemu-img: invalid URI json:{"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"} Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] # qemu-img commit -b 'json:{"driver": "qcow2", "file": {"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"}}' --image-opts file.driver=gluster,file.volume=vol0,file.path=sn2.qcow2,file.server.0.type=inet,file.server.0.host=10.73.196.181,file.server.0.port=24007,driver=qcow2 -p (0.00/100%) [2019-03-01 07:48:37.854788] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. qemu-img: invalid URI json:{"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"} Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] After a deeper investigation, I noticed that the root problem is the format you specified for the backing file when creating snapshots. For example, when you create snapshot based on backing file with URL, committing works # qemu-img create -f qcow2 -b gluster://10.73.196.181/vol0/base.qcow2 gluster://10.73.196.181/vol0/sn1.qcow2 [2019-03-01 08:26:35.585723] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Formatting 'gluster://10.73.196.181/vol0/sn1.qcow2', fmt=qcow2 size=16106127360 backing_file=gluster://10.73.196.181/vol0/base.qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 [2019-03-01 08:26:36.685763] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-01 08:26:37.865136] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. # qemu-img commit -f qcow2 gluster://10.73.196.181/vol0/sn1.qcow2 -p (100.00/100%) [2019-03-01 08:26:56.161917] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Image committed. While you create snapshots based on backing file with JSON, committing will fail then. # qemu-img create -f qcow2 -b 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"vol0","path":"base.qcow2","server":[{"type":"tcp","host":"10.73.196.181","port":"24007"}]}}' gluster://10.73.196.181/vol0/sn1.qcow2 [2019-03-01 08:25:36.578677] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. Formatting 'gluster://10.73.196.181/vol0/sn1.qcow2', fmt=qcow2 size=16106127360 backing_file=json:{"driver":"qcow2",,"file":{"driver":"gluster",,"volume":"vol0",,"path":"base.qcow2",,"server":[{"type":"tcp",,"host":"10.73.196.181",,"port":"24007"}]}} cluster_size=65536 lazy_refcounts=off refcount_bits=16 [2019-03-01 08:25:37.401296] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-01 08:25:38.563899] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. # qemu-img commit -f qcow2 gluster://10.73.196.181/vol0/sn1.qcow2[2019-03-01 08:25:43.906234] E [MSGID: 108006] [afr-common.c:5114:__afr_handle_child_down_event] 0-vol0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. qemu-img: invalid URI json:{"server.0.host": "10.73.196.181", "driver": "gluster", "path": "base.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "vol0"} Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] I proposed a patch upstream to fix this issue: https://patchew.org/QEMU/20190712104617.94707-1-sgarzare@redhat.com/ The fix is upstream and it will be released with QEMU v4.1: commit 0b1847bbc2b4f50e7497cb05c4540bf7b016c9c6 Author: Stefano Garzarella <sgarzare> Date: Mon Jul 15 15:28:44 2019 +0200 gluster: fix .bdrv_reopen_prepare when backing file is a JSON object When the backing_file is specified as a JSON object, the qemu_gluster_reopen_prepare() fails with this message: invalid URI json:{"server.0.host": ...} In this case, we should call qemu_gluster_init() using the QDict 'state->options' that contains the JSON parameters already parsed. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1542445 Signed-off-by: Stefano Garzarella <sgarzare> Message-id: 20190715132844.506584-1-sgarzare Signed-off-by: Max Reitz <mreitz> Verify this bug as below, commitment works now. Tested with: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1 kernel-4.18.0-134.el8 Steps: 1. Create the base image over glusterfs # qemu-img create -f raw gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/basett.img 5G 2. Create snapshot file based on the image over glusterfs with JSON # qemu-img create -f qcow2 -b 'json:{"driver": "raw","file":{"driver":"gluster","volume":"vol","path":"basett.img","server":[{"type":"tcp","host":"gluster-virt-qe-01.lab.eng.pek2.redhat.com","port":"24007"}]}}' gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/sntt.qcow2 3. Commit the snapshot file # qemu-img commit -f qcow2 gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol/sntt.qcow2 Image committed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723 |