Bug 1377663

Summary: the second one of multiple gluster volume hosts will not be tried to connect if the first one is invalid
Product: Red Hat Enterprise Linux 7 Reporter: lijuan men <lmen>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: dyuan, lmen, pkrempa, rbalakri, sabose, sasundar, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-08 15:43:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description lijuan men 2016-09-20 10:31:54 UTC
Description of problem:
the second one of multiple gluster volume hosts will not be tried to connect if the first one is invalid

Version-Release number of selected component (if applicable):
1.test host
libvirt-2.0.0-9.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

2.glusterfs server :
glusterfs-server-3.7.9-12.el7rhgs.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare  glusterfs environment in glusterfs servers:
[root@localhost ~]# gluster volume status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.66.4.213:/opt/test1                49154     0          Y       2545 
Brick 10.66.4.105:/opt/test2                49152     0          Y       24386
Brick 10.66.4.148:/opt/test3                49153     0          Y       20994

2.in the test host(10.66.70.107)
[root@localhost ~]# qemu-img create -f qcow2 gluster://10.66.4.213/test/test1.img 200M

3.start a guest with xml:
<disk type='network' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
      <source protocol='gluster' name='test/test1.img'>
        <host name='10.66.4.2'/>  --->not exists
        <host name='10.66.4.105'/>    --->one of the above glusterfs servers
        <host name='10.66.4.148'/>    --->one of the above glusterfs servers
      </source>
      <target dev='vda' bus='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>

[root@localhost ~]# virsh start bios
error: Failed to start domain bios
error: failed to initialize gluster connection (src=0x7f1be48b7130 priv=0x7f1bec254f10): Transport endpoint is not connected


I have confirmed with qemu qe,if the first glusterfs server IP specified in vm does not exist(or can't be connected),and the second and third IP are right,the guest can boot up successfully using qemu cli.

Do we need to improve it?

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2016-11-11 11:53:05 UTC
Hi All,

I have tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-4.el7rhgs ) with RHEL 7.3 by installing qemu-kvm-rhev from rhel-7-server-rhv-4-mgmt-agent-rpms

[root@ ~]# rpm -qa | grep libvirt
libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
libvirt-python-2.0.0-2.el7.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64
libvirt-daemon-config-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64
libvirt-client-2.0.0-10.el7.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64
libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
libvirt-daemon-2.0.0-10.el7.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64
libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64

[root@ ~]# rpm -qa | grep qemu
qemu-kvm-rhev-2.6.0-27.el7.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-img-rhev-2.6.0-27.el7.x86_64
qemu-kvm-tools-rhev-2.6.0-27.el7.x86_64
qemu-kvm-common-rhev-2.6.0-27.el7.x86_64
qemu-guest-agent-2.5.0-3.el7.x86_64

I could able to use multiple gluster server hosts.

<disk type='network' device='disk'>
      <driver name='qemu' type='raw' error_policy='stop'/>
      <source protocol='gluster' name='rep3vol/vm1.img'>
        <host name='dhcp37-86.lab.eng.blr.redhat.com' port='24007'/>
        <host name='dhcp37-146.lab.eng.blr.redhat.com' port='24007'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

where the first server ( dhcp37-86.lab.eng.blr.redhat.com ) doesn't exists at all, then the volfile server is obtained from the other host. VM could boot from the image.

What is the actual issue reported in this bug ?

Comment 3 lijuan men 2016-11-15 05:32:08 UTC
(In reply to SATHEESARAN from comment #1)
> Hi All,
> 
> I have tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-4.el7rhgs )
> with RHEL 7.3 by installing qemu-kvm-rhev from
> rhel-7-server-rhv-4-mgmt-agent-rpms
> 
> [root@ ~]# rpm -qa | grep libvirt
> libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
> libvirt-python-2.0.0-2.el7.x86_64
> libvirt-daemon-driver-network-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64
> libvirt-daemon-config-network-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64
> libvirt-client-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64
> libvirt-2.0.0-10.el7.x86_64
> libvirt-daemon-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64
> libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64
> 
> [root@ ~]# rpm -qa | grep qemu
> qemu-kvm-rhev-2.6.0-27.el7.x86_64
> libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
> ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
> qemu-img-rhev-2.6.0-27.el7.x86_64
> qemu-kvm-tools-rhev-2.6.0-27.el7.x86_64
> qemu-kvm-common-rhev-2.6.0-27.el7.x86_64
> qemu-guest-agent-2.5.0-3.el7.x86_64
> 
> I could able to use multiple gluster server hosts.
> 
> <disk type='network' device='disk'>
>       <driver name='qemu' type='raw' error_policy='stop'/>
>       <source protocol='gluster' name='rep3vol/vm1.img'>
>         <host name='dhcp37-86.lab.eng.blr.redhat.com' port='24007'/>
>         <host name='dhcp37-146.lab.eng.blr.redhat.com' port='24007'/>
>       </source>
>       <target dev='vda' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
> 
> where the first server ( dhcp37-86.lab.eng.blr.redhat.com ) doesn't exists
> at all, then the volfile server is obtained from the other host. VM could
> boot from the image.
> 
> What is the actual issue reported in this bug ?


I test the scenario again.
perhaps it is related to the glusterfs version.

summary:
1.if the glusterfs server is glusterfs-server-3.8.4-5.el7rhgs.x86_64,and client host is glusterfs-3.7.9-12.el7.x86_64(it is installed in rhel7.3 release version by default),the guest in the **client host** will start failed.
2.if the glusterfs server is glusterfs-server-3.8.4-5.el7rhgs.x86_64,and client host is glusterfs-3.8.4-5.el7.x86_64,the guest will start successfully.

And I have another question.
When I use the newest glusterfs version in the server host and client host(as summary 2),the guest will start successfully.But when I destroy the guest,the guest will output some error info as followed:
[root@localhost ~]# virsh destroy bios
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to destroy domain bios
error: internal error: connection closed due to keepalive timeout

Is it normal?

Comment 4 Peter Krempa 2016-12-08 15:43:54 UTC
It indeed depends on version of libgfapi on the client. With recent enough version all servers are tried in order.

As of the keepalive timeout when destroying the VM, the problem is that libvirt executes qemuProcessStop from the event loop. That is wrong, since qemuProcessStop takes a long time. I'll file a separate bug for that.