Bug 1017289
Summary: | Snapshots on GlusterFS w/ libgfapi enabled | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Allan <dallan> | |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.4 | CC: | aberezin, bsarathy, cpelland, deepakcs, dmaley, dshetty, dyuan, eblake, eharney, fdeutsch, gfidente, grajaiya, howey.vernon, info, jdenemar, jentrena, josh, jraju, juzhang, lyarwood, mzhan, ndipanov, perfbz, pkrempa, rbalakri, rbryant, rcyriac, rlandman, sasundar, scohen, shyu, smanjara, xuzhang, yanyang, yeylon | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Previously, libvirt treated all storage with network addresses (including images accessed via gluster) as RAW images. Since RAW files cannot have a backing file, gluster could not be used as the destination for an external snapshot. The workaround was to use a FUSE mount, which resulted in slower performance.
New code has been added to link libvirt with libgfapi, and enhanced XML now allows the use of QCOW2 images on gluster as a snapshot destination. It is no longer necessary to use a FUSE mount, and libvirt can now take an external snapshot with a gluster destination.
|
Story Points: | --- | |
Clone Of: | 1017288 | |||
: | 1032370 (view as bug list) | Environment: | ||
Last Closed: | 2014-12-02 16:54:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1002699, 1017288, 1040172, 1040649, 1045047, 1045196 |
Description
Dave Allan
2013-10-09 15:04:14 UTC
Update my test results here, snapshot --disk-only is not support for glusterfs volumn Tested with packages: libvirt-0.10.2-29.el6.x86_64 qemu-kvm-0.12.1.2-2.414.el6.x86_64 Test steps: 1. start a domain with glusterfs volume qcow2 image, with 'transport=tcp' <disk type='network' device='disk'> <driver name='qemu' type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rh6-qcow2.img'> <host name='10.66.7.108' port='24007' transport='tcp'/> </source> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> # virsh define r6-qcow2.xml Domain r6-qcow2 defined from r6-qcow2.xml # virsh start r6-qcow2 Domain r6-qcow2 started # virsh list --all Id Name State ---------------------------------------------------- 14 r6-qcow2 running 2. create snapshots of the domain. # virsh snapshot-create r6-qcow2 Domain snapshot 1381993164 created # virsh snapshot-list r6-qcow2 Name Creation Time State ------------------------------------------------------------ 1381993164 2013-10-17 14:59:24 +0800 running # virsh snapshot-create-as r6-qcow2 snap1 --disk-only error: unsupported configuration: source for disk 'vda' is not a regular file; refusing to generate external snapshot name # virsh snapshot-create-as r6-qcow2 snap2 --memspec file=snap3,snapshot=external error: unsupported configuration: source for disk 'vda' is not a regular file; refusing to generate external snapshot name (In reply to chhu from comment #3) > Update my test results here, snapshot --disk-only is not support for > glusterfs volumn > > Tested with packages: > libvirt-0.10.2-29.el6.x86_64 > qemu-kvm-0.12.1.2-2.414.el6.x86_64 also test with rhev packages, met the same error. qemu-img-rhev-0.12.1.2-2.414.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.414.el6.x86_64 > > Test steps: > 1. start a domain with glusterfs volume qcow2 image, with 'transport=tcp' > > <disk type='network' device='disk'> > <driver name='qemu' type='qcow2'/> > <source protocol='gluster' name='gluster-vol1/rh6-qcow2.img'> > <host name='10.66.7.108' port='24007' transport='tcp'/> > </source> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </disk> > > > # virsh define r6-qcow2.xml > Domain r6-qcow2 defined from r6-qcow2.xml > > # virsh start r6-qcow2 > Domain r6-qcow2 started > > # virsh list --all > Id Name State > ---------------------------------------------------- > 14 r6-qcow2 running > > 2. create snapshots of the domain. > > # virsh snapshot-create r6-qcow2 > Domain snapshot 1381993164 created > > # virsh snapshot-list r6-qcow2 > Name Creation Time State > ------------------------------------------------------------ > 1381993164 2013-10-17 14:59:24 +0800 running > > # virsh snapshot-create-as r6-qcow2 snap1 --disk-only > error: unsupported configuration: source for disk 'vda' is not a regular > file; refusing to generate external snapshot name > > # virsh snapshot-create-as r6-qcow2 snap2 --memspec > file=snap3,snapshot=external > error: unsupported configuration: source for disk 'vda' is not a regular > file; refusing to generate external snapshot name Open bug1022301 for the snapshot issue mentioned in comment4. Bug 1022301 - snapshot-create-as --disk-only is not support with glusterfs volume (In reply to chhu from comment #6) > Open bug1022301 for the snapshot issue mentioned in comment4. > Bug 1022301 - snapshot-create-as --disk-only is not support with glusterfs > volume I don't know why you created a new bug - it's one and the same issue. Libvirt will add support for snapshot-create-as with glusterfs at the same time it adds support for qcow2 glusterfs images with backing files. *** Bug 1022301 has been marked as a duplicate of this bug. *** Part 1 of backports is posted http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-November/msg01123.html but more patches are still needed (both upstream and backported) before moving this bug to POST. Peter, I got the RPM's installed. However, I am not able to get the snapshot working. Here is what I did: Installed the new rpm's. Enabled libgfapi. Created an instance and a volume Attached the volume to the instance. Now I tried to create the snapshot of the volume attached using a snapshot xml file and virsh commands as indicated in c#18. Libvirt packages installed: # rpm -qa | grep libvirt libvirt-client-0.10.2-1.el6.x86_64 libvirt-devel-0.10.2-1.el6.x86_64 libvirt-debuginfo-0.10.2-1.el6.x86_64 libvirt-0.10.2-1.el6.x86_64 libvirt-python-0.10.2-1.el6.x86_64 libvirt-lock-sanlock-0.10.2-1.el6.x86_64 #vi snap.xml <domainsnapshot> <name>snap1-disk-only</name> <disks> <disk name='vdb' type='network'> <driver type='qcow2'/> <source protocol='gluster' name='cinder-vol/14710d15-5c9b-4666-a21f-c15d4f12dedd.snap'> <host name='10.70.37.134' port='24007'/> </source> </disk> </disks> </domainsnapshot> #virsh snapshot-create instance-00000015 snap.xml --disk-only error: internal error unable to execute QEMU command 'transaction': gluster+tcp://10.70.37.134:24007/cinder-vol/volume-14710d15-5c9b-4666-a21f-c15d4f12dedd.SNAP: error while creating qcow2: Permission denied So that's the error I get. The cinder volume on gluster server has permissions set to 165:165. Gluster cinder volume parameters: Options Reconfigured: diagnostics.brick-log-level: DEBUG nfs.rpc-auth-allow: on server.allow-insecure: on storage.owner-uid: 165 storage.owner-gid: 165 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off (In reply to shilpa from comment #27) > Peter, > #virsh snapshot-create instance-00000015 snap.xml --disk-only > error: internal error unable to execute QEMU command 'transaction': > gluster+tcp://10.70.37.134:24007/cinder-vol/volume-14710d15-5c9b-4666-a21f- > c15d4f12dedd.SNAP: error while creating qcow2: Permission denied Is the directory containing the image writable by the UID running the qemu process? You may also try creating a empty file on the gluster volume and set it's permissions to 666 (rw-rw-rw-) and use it as the snapshot target along with "--reuse-external" flag for virsh. > > So that's the error I get. The cinder volume on gluster server has > permissions set to 165:165. Did some further testing. Tried to create a volume snapshot using cinder api this time. 1. # cinder snapshot-create --force True --display-name snap 65a0400e-0e92-40c4-a7f8-c1a89860f546 +---------------------+--------------------------------------+ | Property | Value | +---------------------+--------------------------------------+ | created_at | 2014-03-26T12:20:23.981216 | | display_description | None | | display_name | snap | | id | a1620281-34e6-44f8-ab88-24eb1decd8e0 | | metadata | {} | | size | 1 | | status | creating | | volume_id | 65a0400e-0e92-40c4-a7f8-c1a89860f546 | +---------------------+--------------------------------------+ [root@rhs-client8 ~(keystone_admin)]# cinder snapshot-list +--------------------------------------+--------------------------------------+--------+--------------+------+ | ID | Volume ID | Status | Display Name | Size | +--------------------------------------+--------------------------------------+--------+--------------+------+ | a1620281-34e6-44f8-ab88-24eb1decd8e0 | 65a0400e-0e92-40c4-a7f8-c1a89860f546 | error | snap | 1 | +--------------------------------------+--------------------------------------+--------+--------------+------+ I do see a snap file being created by cinder: # ls -l /var/lib/cinder/volumes/db41b6b4932c1698ea706ff23e675efd -rw-r--r--. 1 root root 197120 Mar 26 17:36volume-65a0400e-0e92-40c4-a7f8-c1a89860f546.a1620281-34e6-44f8-ab88-24eb1decd8e0 Error logs from cinder/volume.log: 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last): 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp **args) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 378, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp {'status': 'error'}) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp self.gen.next() 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 369, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp model_update = self.driver.create_snapshot(snapshot_ref) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/lockutils.py", line 247, in inner 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp retval = f(*args, **kwargs) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 295, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp return self._create_snapshot(snapshot) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 435, in _create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp raise exception.GlusterfsException(msg) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp GlusterfsException: Nova returned "error" status while creating snapshot. According to the last line,nova returns an error while creating snapshot. Deepak will update findings from his investigation. Herez my analysis of the issue (cinder snapshot not working) ... From cinder/volume.log and looking into the cinder-vol mnt point... -rw-r--r--. 1 root root 197120 Mar 26 17:36 volume-65a0400e-0e92-40c4-a7f8-c1a89860f546.a1620281-34e6-44f8-ab88-24eb1decd8e0 -rw-rw-rw-. 1 root root 1073741824 Mar 26 17:35 volume-65a0400e-0e92-40c4-a7f8-c1a89860f546 2014-03-26 17:50:26.675 22427 ERROR cinder.openstack.common.rpc.amqp [req-b3681a31-794d-4589-9889-5e71c87d1931 3632ea88c6d04a439a0671ef0bd29870 1bd72dd353524d0eb3c072829d744cbc] Exception during message handling 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last): 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp **args) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 378, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp {'status': 'error'}) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp self.gen.next() 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/manager.py", line 369, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp model_update = self.driver.create_snapshot(snapshot_ref) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/lockutils.py", line 247, in inner 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp retval = f(*args, **kwargs) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 295, in create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp return self._create_snapshot(snapshot) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/volume/drivers/glusterfs.py", line 435, in _create_snapshot 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp raise exception.GlusterfsException(msg) 2014-03-26 17:50:26.675 22427 TRACE cinder.openstack.common.rpc.amqp GlusterfsException: Nova returned "error" status while creating snapshot. ** The above shows that cinder does create the snap file (but it has rw-r-r perms which i feel is wrong, i think it shud have rw-rw-rw) but errors out as Nova returns error Looking into nova/compute.log 2014-03-26 17:50:25.609 14799 ERROR nova.virt.libvirt.driver [req-6aa5dbe8-2e43-476b-a12d-13560eba20dd 3632ea88c6d04a439a0671ef0bd29870 1bd72dd353524d0eb3c072829d744cbc] Error occurred during volume_snapshot_create, sending error status to Cinder. 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver Traceback (most recent call last): 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1675, in volume_snapshot_create 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver create_info['new_file']) 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1584, in _volume_snapshot_create 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver new_file_path = os.path.join(os.path.dirname(current_file), 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver File "/usr/lib64/python2.6/posixpath.py", line 119, in dirname 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver i = p.rfind('/') + 1 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver AttributeError: 'NoneType' object has no attribute 'rfind' 2014-03-26 17:50:25.609 14799 TRACE nova.virt.libvirt.driver ** In analysing nova/virt/libvirt.py it seems it doesn't have the code/support to take snapshots for network disks. It assumes all disks_to_snap as type=file and also references disk.source_path which isn't correct for the gluster case So cinder snapshot-create won't work until Nova gets the support for network disks I tested the snapshot creation using qemu/kvm + gluster environment. Just thought I could add the results to this bug and it may help someone diagnosing earlier failures Versions of the component ------------------------- qemu - [root@rhs-client10 test]# rpm -qa | grep qemu qemu-kvm-rhev-0.12.1.2-2.415.el6_5.7.x86_64 gpxe-roms-qemu-0.9.7-6.10.el6.noarch qemu-img-rhev-0.12.1.2-2.415.el6_5.7.x86_64 These packages are obtained from, http://download.lab.bos.redhat.com/rel-eng/repos/rhevh-rhel-6.5-candidate/x86_64 Libvirt version as available in comment22 libvirt-client-0.10.2-1.el6.x86_64 libvirt-python-0.10.2-1.el6.x86_64 libvirt-0.10.2-1.el6.x86_64 libvirt-debuginfo-0.10.2-1.el6.x86_64 libvirt-lock-sanlock-0.10.2-1.el6.x86_64 libvirt-devel-0.10.2-1.el6.x86_64 glusterfs-3.4.0.59rhs-1.el6rhs [ RHSS 2.1 Update2 - codenamed 'Corbett' ] Steps ====== RHSS Side : 0. Created a 2 Node Cluster running RHSS 2.1 Update2 1. Created 2X2 distributed replicate volume 2. Started the volume 3. Set the option - server.allow-insecure on (i.e) gluster volume set <vol-name> server.allow-insecure on 4. Restart the volume 5. Edited the glusterd vol file (/etc/glusterfs/glusterd.vol) to include an option: "option rpc-auth-allow-insecure on" 6. Restarted glusterd RHEL 6.5 Side: 1. Installed qemu-kvm (RHEV Version of qemu), which also installed in glusterfs dependency 2. Created a image file using qemu-img (i.e) qemu-img create gluster://<server>/<vol>/<image> <size> 3. Created a libvirt xml for defining a VM using the image <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source protocol='gluster' name='distrepvol/test.img'> <host name='10.70.37.106' port='24007'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> 4. Defined a vm using the XML 5. NFS mounted the gluster volume and changed the ownership to qemu:qemu 6. Started the vm (i.e) virsh start <DOMAIN> 7. Created a xml defining the snapshot <domainsnapshot> <name>snap1</name> <disks> <disk name='vda' type='network'> <driver type='qcow2'/> <source protocol='gluster' name='distrepvol/snap1-vm2.img'> <host name='10.70.37.106' port='24007'/>' </source> </disk> </disks> </domainsnapshot> 8. Created a snapshot [root@rhs-client10 ~]# virsh snapshot-create --domain vm2 --xmlfile snap.xml --disk-only Domain snapshot snap1 created from 'snap.xml' 9. Listed the snapshot [root@rhs-client10 ~]# virsh snapshot-list vm2 Name Creation Time State ------------------------------------------------------------ snap1 2014-04-02 12:47:30 +0530 disk-snapshot 10. Verified the same by NFS mounting the gluster volume [root@rhs-client10 test]# qemu-img info snap1-vm2.img image: snap1-vm2.img file format: qcow2 virtual size: 5.0G (5368709120 bytes) disk size: 193K cluster_size: 65536 backing file: gluster+tcp://10.70.37.106:24007/distrepvol/vm2.img backing file format: raw I haven't tried blockpull or revert cases (In reply to SATHEESARAN from comment #34) > 3. Created a libvirt xml for defining a VM using the image > <disk type='network' device='disk'> > <driver name='qemu' type='raw' cache='none'/> > <source protocol='gluster' name='distrepvol/test.img'> Typo here - vm's disk image file is vm2.img name='distrepvol/vm2.img' I modified this file later to test.img and mistakenly copied the same here. Thanks Deepak for pointing that out. > <host name='10.70.37.106' port='24007'/> > </source> > <target dev='vda' bus='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> > </disk> On Deepak's request, retried snapshot creation with flag --reuse-external, as NOVA does the same in openstack environment The setup remains the same as in comment34 Steps follows, 1. Created a image file of size, equal or more than size of VM's disk image with qcow2 format 2. Changed the ownership of the image file to qemu:qemu 3. Took a snapshot of the VM, with --reuse-external pointing to new disk image file in XML file Cli logs from hypervisor ========================= [root@rhs-client10 ~]# qemu-img create -f qcow2 gluster://10.70.37.106/distrepvol/snap2-vm2.img 6G Formatting 'gluster://10.70.37.106/distrepvol/snap2-vm2.img', fmt=qcow2 size=6442450944 encryption=off cluster_size=65536 [2014-04-02 10:47:04.499081] I [client.c:2103:client_rpc_notify] 0-distrepvol-client-0: disconnected from 10.70.37.106:49155. Client process will keep trying to connect to glusterd until brick's port is available. [2014-04-02 10:47:04.499136] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:04.499163] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:04.725515] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:04.725564] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:05.986438] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:05.986492] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:06.211565] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:47:06.211620] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [root@rhs-client10 ~]# qemu-img info /mnt/test/snap2-vm2.img image: /mnt/test/snap2-vm2.img file format: qcow2 virtual size: 6.0G (6442450944 bytes) disk size: 193K cluster_size: 65536 [root@rhs-client10 ~]# ls /mnt/test/ snap1-vm2.img snap2-vm2.img vm1.img vm2.img [root@rhs-client10 ~]# ls /mnt/test/ -lh total 3.8G -rw-------. 1 qemu qemu 193K Apr 2 2014 snap1-vm2.img -rw-------. 1 root root 193K Apr 2 2014 snap2-vm2.img -rw-------. 1 qemu qemu 10G Apr 2 2014 vm1.img -rw-------. 1 qemu qemu 5.0G Apr 2 06:15 vm2.img [root@rhs-client10 ~]# chown qemu:qemu /mnt/test/snap2-vm2.img [root@rhs-client10 ~]# ls /mnt/test/ -lh total 3.8G -rw-------. 1 qemu qemu 193K Apr 2 2014 snap1-vm2.img -rw-------. 1 qemu qemu 193K Apr 2 2014 snap2-vm2.img -rw-------. 1 qemu qemu 10G Apr 2 2014 vm1.img -rw-------. 1 qemu qemu 5.0G Apr 2 06:15 vm2.img [root@rhs-client10 ~]# virsh snapshot-create --domain vm2 --xmlfile snap.xml --reuse-external --disk-only Domain snapshot snap2 created from 'snap.xml' [root@rhs-client10 ~]# virsh snapshot-list vm2 Name Creation Time State ------------------------------------------------------------ snap1 2014-04-02 12:47:30 +0530 disk-snapshot snap2 2014-04-02 16:19:16 +0530 disk-snapshot [root@rhs-client10 ~]# qemu-img info /mnt/test/snap2-vm2.img image: /mnt/test/snap2-vm2.img file format: qcow2 virtual size: 6.0G (6442450944 bytes) disk size: 193K cluster_size: 65536 If the ownership of the newly created image for snapshot was not set to qemu:qemu, then the snapshot fails, with 'permission denied' message I described the same below : [root@rhs-client10 ~]# qemu-img create -f qcow2 gluster://10.70.37.106/distrepvol/snap3-vm2.img 5G Formatting 'gluster://10.70.37.106/distrepvol/snap3-vm2.img', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 [2014-04-02 10:50:58.029597] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:58.029662] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:58.279969] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:58.280011] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:59.542434] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:59.542479] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:59.776851] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2014-04-02 10:50:59.776916] E [afr-common.c:4025:afr_notify] 0-distrepvol-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [root@rhs-client10 ~]# virsh snapshot-create --domain vm2 --xmlfile snap.xml --reuse-external --disk-only error: internal error unable to execute QEMU command 'transaction': Could not open 'gluster+tcp://10.70.37.106:24007/distrepvol/snap3-vm2.img': Permission denied [root@rhs-client10 ~]# chown qemu:qemu /mnt/test/snap3-vm2.img [root@rhs-client10 ~]# virsh snapshot-create --domain vm2 --xmlfile snap.xml --reuse-external --disk-only Domain snapshot snap3 created from 'snap.xml' [root@rhs-client10 ~]# qemu-img info /mnt/test/snap3-vm2.img image: /mnt/test/snap3-vm2.img file format: qcow2 virtual size: 5.0G (5368709120 bytes) disk size: 193K cluster_size: 65536 [root@rhs-client10 ~]# virsh snapshot-list vm2 Name Creation Time State ------------------------------------------------------------ snap1 2014-04-02 12:47:30 +0530 disk-snapshot snap2 2014-04-02 16:19:16 +0530 disk-snapshot snap3 2014-04-02 16:22:37 +0530 disk-snapshot Peter, Per Satheesaran's experiment in comment #36, it seems using --reuse-external isn't working as expected. virsh thinks its successfull, but the qemu-img info of the snap file doesn't have backingfile section/field... Can you validate if the steps used by satheesaran is correct here ? (In reply to Deepak C Shetty from comment #37) > Peter, > Per Satheesaran's experiment in comment #36, it seems using > --reuse-external isn't working as expected. virsh thinks its successfull, > but the qemu-img info of the snap file doesn't have backingfile > section/field... Can you validate if the steps used by satheesaran is > correct here ? Unfortunately from the steps described above it's not clear whether the "snap.xml" file was modified correctly so that the snapshot target would be stored in snap2-vm2.img as the file is re-used multiple times. SATHEESARAN, please provide output of "virsh snapshot-dumpxml vm2 snap2" Peter, This is the content of snap.xml <domainsnapshot> <name>snap3</name> <disks> <disk name='vda' type='network'> <driver type='qcow2'/> <source protocol='gluster' name='distrepvol/snap3-vm2.img'> <host name='10.70.37.106' port='24007'/>' </source> </disk> </disks> </domainsnapshot> I could see that the VM started using the new snapshot file [root@rhs-client10 ~]# virsh domblklist vm2 Target Source ------------------------------------------------ vda distrepvol/snap3-vm2.img hdc - I have created a new VM that uses qemu driver for glusterfs and again took a snapshot afresh ( even Deepak was looking at all steps ), the qemu-info on that file, which is created using --reuse-external doesn't have 'backing file' info I double checked it When using --reuse-external, you need to pre-create the image with correct backing image, otherwise qemu won't modify it to be correct. This is designed to work this way unfortunately. use qemu-img -f qcow2 -o backing_image=gluster....,backing_fmt=qcow2 ... to create the image with correct backing path. Peter, Hmmm thats interesting and a bit awkward too. Is this specific to libgfapi or applicable to any disk type ? Also FWIW when we don't use --reuse-external, qemu creates the file and also sets the backing correctly, so its unclear why it cant set it when a file is supplied to begin with ? I also need to verify whether cinder-gluster driver will work correctly if this is the case, since the last i saw the code, i think it just create a new volume (doesn't set backing file etc) and passes on to nova which then using REUSE_EXT snapflags.... if this is correct, then cinder/nova flow is broken too! (In reply to Deepak C Shetty from comment #42) > Peter, > Hmmm thats interesting and a bit awkward too. Is this specific to > libgfapi or applicable to any disk type ? No, that's the desired behavior every backing type of disk snapshot apparently but it isn't documented properly unfortunately. > > Also FWIW when we don't use --reuse-external, qemu creates the file and also > sets the backing correctly, so its unclear why it cant set it when a file is > supplied to begin with ? The original idea was that the mgmt app pre-creates the metadata in a way it desires including the backing path. Thus qemu is not changing it. > > I also need to verify whether cinder-gluster driver will work correctly if > this is the case, since the last i saw the code, i think it just create a > new volume (doesn't set backing file etc) and passes on to nova which then > using REUSE_EXT snapflags.... if this is correct, then cinder/nova flow is > broken too! If it works with regular disk snapshots it's at least partially doing the right thing. The one thing to verify here is whether the gluster volume URI in case of an external disk is used correctly. (In reply to Peter Krempa from comment #43) > (In reply to Deepak C Shetty from comment #42) > > Peter, > > Hmmm thats interesting and a bit awkward too. Is this specific to > > libgfapi or applicable to any disk type ? > > No, that's the desired behavior every backing type of disk snapshot > apparently but it isn't documented properly unfortunately. > > > > > Also FWIW when we don't use --reuse-external, qemu creates the file and also > > sets the backing correctly, so its unclear why it cant set it when a file is > > supplied to begin with ? > > The original idea was that the mgmt app pre-creates the metadata in a way it > desires including the backing path. Thus qemu is not changing it. Got it now. > > > > > I also need to verify whether cinder-gluster driver will work correctly if > > this is the case, since the last i saw the code, i think it just create a > > new volume (doesn't set backing file etc) and passes on to nova which then > > using REUSE_EXT snapflags.... if this is correct, then cinder/nova flow is > > broken too! > > If it works with regular disk snapshots it's at least partially doing the > right thing. The one thing to verify here is whether the gluster volume URI > in case of an external disk is used correctly. I saw the cinder code, it does create qcow2 backed to the baseimg and then passes it to Nova, so we are good there |