Red Hat Bugzilla – Bug 1156288
libvirtd crashed on disk snapshot with rdma glusterfs image
Last modified: 2015-03-05 02:46:33 EST
Created attachment 950232 [details] libvirtd back trace file Description of problem: when doing disk snapshot with rdma glusterfs disk, libvirtd crashed Version-Release number of selected component (if applicable): libvirt-1.2.8-5.el7.x86_64 qemu-kvm-rhev-2.1.2-4.el7.x86_64 kernel-3.10.0-123.el7.x86_64 How reproducible: always Steps to Reproduce: 1. create and start a vm with glusterfs rdma disk ... <disk type='network' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source protocol='gluster' name='gluster-vol1/gluster.qcow2'> <host name='xx.xx.xx.xx' port='24007' transport='rdma'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> ... # virsh start virt-tests-vm1 Domain virt-tests-vm1 started # virsh list Id Name State ---------------------------------------------------- 256 virt-tests-vm1 running 2. do external disk snapshot # virsh snapshot-create-as virt-tests-vm1 snapshot1 snap1-desc --disk-only --atomic --no-metadata vda,snapshot=external,file=/home/wayne/work/virt-test/tmp/gluster.snap1 2014-10-24 05:00:43.244+0000: 3937: info : libvirt version: 1.2.8, package: 5.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-10-09-08:52:56, x86-030.build.eng.bos.redhat.com) 2014-10-24 05:00:43.244+0000: 3937: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f61e324edd0 after 6 keepalive messages in 35 seconds 2014-10-24 05:00:43.244+0000: 3935: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f61e324edd0 after 6 keepalive messages in 35 seconds error: internal error: received hangup / error event on socket 3. Actual results: libvirtd crashed Expected results: succeed Additional info: back trace attached snapshot works fine with transport as tcp glusterfs image
Another way to reproduce this crash 1) prepare a dir pool named default # virsh pool-list Name State Autostart ------------------------------------------- default active yes # virsh pool-dumpxml default <pool type='dir'> <name>default</name> <uuid>fd9db86b-ab31-44fe-02b2-9ccdf1a2c52a</uuid> <capacity unit='bytes'>42141450240</capacity> <allocation unit='bytes'>7179096064</allocation> <available unit='bytes'>34962354176</available> <source> </source> <target> <path>/var/lib/libvirt/images</path> <permissions> <mode>0755</mode> <owner>-1</owner> <group>-1</group> </permissions> </target> </pool> 2) Create a volume with a gluster+tcp or gluster+rmda as backing file # qemu-img info gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img image: gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img file format: raw virtual size: 8.0G (8589934592 bytes) disk size: 8.0G # qemu-img create -f qcow2 /var/lib/libvirt/images/test.img -b gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img -o backing_fmt=raw Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 size=8589934592 backing_file='gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off 3) Refresh the default pool will lead libvirtd crash 4) same story with "gluster+tcp" Remember to delete the volume you created in step 2) or 4), otherwise you can't start libvirtd daemon anymore.
Fixed upstream. The problem will happen with any file that has a backing file containing a transport protocol as a part of the URI schema. commit 98784369fd52ed6aa9bab2a9a9d213c52019e6ee Author: Peter Krempa <pkrempa@redhat.com> Date: Wed Oct 29 10:55:23 2014 +0100 storage: Fix crash when parsing backing store URI with schema The code that parses the schema from the URI touches the "hosts[0]" member of the storage file source structure in case the URI contains a schema. The hosts array was not yet allocated at the point in the code where the transport protocol was parsed and set. This lead to a crash of libvirtd. Fix the code by allocating the "hosts" array upfront and add a test case to verify this scenario. (Unfortunately this requires shuffling the test case numbers too).
Transport='tcp' is the default thus it's omitted from the XML. If you use RDMA for the backing file it should be recorded in the XML.
Hi, What I want to ask is: from libvirt side, Is "gluster+tcp://ip/gluster-volume/file" and "gluster://ip/gluster-volume/file" different if they show in backing chains. When rh6.s1's backing file is "gluster+tcp://ip/gluster-volume/file", libvirt will discard the backing file; when rh6.s1's backing file is "gluster://ip/gluster-volume/file", libvirt will recognise it.
Reproduce this with libvirt-1.2.8-5.el7.x86_64, verify with libvirt-1.2.8-6.el7.x86_64 Step as comment 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html