Bug 1156288

Summary: libvirtd crashed on disk snapshot with rdma glusterfs image
Product: Red Hat Enterprise Linux 7 Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: dyuan, mzhan, pkrempa, pzhang, rbalakri, shyu, xuzhang, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.8-6.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:46:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd back trace file none

Description Wayne Sun 2014-10-24 05:23:01 UTC
Created attachment 950232 [details]
libvirtd back trace file

Description of problem:
when doing disk snapshot with rdma glusterfs disk, libvirtd crashed

Version-Release number of selected component (if applicable):
libvirt-1.2.8-5.el7.x86_64
qemu-kvm-rhev-2.1.2-4.el7.x86_64
kernel-3.10.0-123.el7.x86_64


How reproducible:
always

Steps to Reproduce:
1. create and start a vm with glusterfs rdma disk
...
    <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source protocol='gluster' name='gluster-vol1/gluster.qcow2'>
        <host name='xx.xx.xx.xx' port='24007' transport='rdma'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
...

# virsh start virt-tests-vm1
Domain virt-tests-vm1 started

# virsh list
 Id    Name                           State
----------------------------------------------------
 256   virt-tests-vm1                 running

2. do external disk snapshot
# virsh snapshot-create-as virt-tests-vm1 snapshot1 snap1-desc --disk-only --atomic --no-metadata vda,snapshot=external,file=/home/wayne/work/virt-test/tmp/gluster.snap1
2014-10-24 05:00:43.244+0000: 3937: info : libvirt version: 1.2.8, package: 5.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-10-09-08:52:56, x86-030.build.eng.bos.redhat.com)
2014-10-24 05:00:43.244+0000: 3937: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f61e324edd0 after 6 keepalive messages in 35 seconds
2014-10-24 05:00:43.244+0000: 3935: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f61e324edd0 after 6 keepalive messages in 35 seconds
error: internal error: received hangup / error event on socket

3.

Actual results:
libvirtd crashed

Expected results:
succeed

Additional info:
back trace attached
snapshot works fine with transport as tcp glusterfs image

Comment 2 Shanzhi Yu 2014-10-27 07:41:07 UTC
Another way to reproduce this crash

1) prepare a dir pool  named default
# virsh pool-list
  Name                 State      Autostart
-------------------------------------------
  default              active     yes
# virsh pool-dumpxml default
<pool type='dir'>
   <name>default</name>
   <uuid>fd9db86b-ab31-44fe-02b2-9ccdf1a2c52a</uuid>
   <capacity unit='bytes'>42141450240</capacity>
   <allocation unit='bytes'>7179096064</allocation>
   <available unit='bytes'>34962354176</available>
   <source>
   </source>
   <target>
     <path>/var/lib/libvirt/images</path>
     <permissions>
       <mode>0755</mode>
       <owner>-1</owner>
       <group>-1</group>
     </permissions>
   </target>
</pool>

2) Create a volume with a gluster+tcp or gluster+rmda as backing file
# qemu-img info gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img
image: gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img
file format: raw
virtual size: 8.0G (8589934592 bytes)
disk size: 8.0G

# qemu-img create -f qcow2 /var/lib/libvirt/images/test.img -b
gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img -o backing_fmt=raw
Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 size=8589934592
backing_file='gluster+rdma://10.66.6.111/gluster-vol1/rhel65.img'
backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off

3) Refresh the default pool will lead libvirtd crash

4) same story with "gluster+tcp"

Remember to delete the volume you created in step 2) or 4), otherwise
you can't start libvirtd daemon anymore.

Comment 3 Peter Krempa 2014-10-29 16:30:37 UTC
Fixed upstream. The problem will happen with any file that has a backing file containing a transport protocol as a part of the URI schema.

commit 98784369fd52ed6aa9bab2a9a9d213c52019e6ee
Author: Peter Krempa <pkrempa>
Date:   Wed Oct 29 10:55:23 2014 +0100

    storage: Fix crash when parsing backing store URI with schema
    
    The code that parses the schema from the URI touches the "hosts[0]"
    member of the storage file source structure in case the URI contains a
    schema. The hosts array was not yet allocated at the point in the code
    where the transport protocol was parsed and set. This lead to a crash of
    libvirtd.
    
    Fix the code by allocating the "hosts" array upfront and add a test case
    to verify this scenario. (Unfortunately this requires shuffling the test
    case numbers too).

Comment 6 Peter Krempa 2014-10-30 06:14:06 UTC
Transport='tcp' is the default thus it's omitted from the XML. If you use RDMA for the backing file it should be recorded in the XML.

Comment 7 Shanzhi Yu 2014-10-30 07:47:12 UTC
Hi,

What I want to ask is: 
from libvirt side, Is "gluster+tcp://ip/gluster-volume/file" and "gluster://ip/gluster-volume/file" different if they show in backing chains.

When rh6.s1's backing file is "gluster+tcp://ip/gluster-volume/file", libvirt will discard the backing file; when rh6.s1's backing file is "gluster://ip/gluster-volume/file", libvirt will recognise it.

Comment 9 Shanzhi Yu 2014-11-14 09:10:46 UTC
Reproduce this with libvirt-1.2.8-5.el7.x86_64, verify with libvirt-1.2.8-6.el7.x86_64

Step as comment 0

Comment 11 errata-xmlrpc 2015-03-05 07:46:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html