1377663 – the second one of multiple gluster volume hosts will not be tried to connect if the first one is invalid

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1377663 - the second one of multiple gluster volume hosts will not be tried to connect if the first one is invalid

Summary: the second one of multiple gluster volume hosts will not be tried to connect ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Krempa
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-20 10:31 UTC by lijuan men
Modified:	2016-12-08 15:43 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-12-08 15:43:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description lijuan men 2016-09-20 10:31:54 UTC

Description of problem:
the second one of multiple gluster volume hosts will not be tried to connect if the first one is invalid

Version-Release number of selected component (if applicable):
1.test host
libvirt-2.0.0-9.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

2.glusterfs server :
glusterfs-server-3.7.9-12.el7rhgs.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare  glusterfs environment in glusterfs servers：
[root@localhost ~]# gluster volume status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.66.4.213:/opt/test1                49154     0          Y       2545 
Brick 10.66.4.105:/opt/test2                49152     0          Y       24386
Brick 10.66.4.148:/opt/test3                49153     0          Y       20994

2.in the test host(10.66.70.107)
[root@localhost ~]# qemu-img create -f qcow2 gluster://10.66.4.213/test/test1.img 200M

3.start a guest with xml:
<disk type='network' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
      <source protocol='gluster' name='test/test1.img'>
        <host name='10.66.4.2'/>  --->not exists
        <host name='10.66.4.105'/>    --->one of the above glusterfs servers
        <host name='10.66.4.148'/>    --->one of the above glusterfs servers
      </source>
      <target dev='vda' bus='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>

[root@localhost ~]# virsh start bios
error: Failed to start domain bios
error: failed to initialize gluster connection (src=0x7f1be48b7130 priv=0x7f1bec254f10): Transport endpoint is not connected


I have confirmed with qemu qe,if the first glusterfs server IP specified in vm does not exist(or can't be connected),and the second and third IP are right,the guest can boot up successfully using qemu cli.

Do we need to improve it?

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2016-11-11 11:53:05 UTC

Hi All,

I have tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-4.el7rhgs ) with RHEL 7.3 by installing qemu-kvm-rhev from rhel-7-server-rhv-4-mgmt-agent-rpms

[root@ ~]# rpm -qa | grep libvirt
libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
libvirt-python-2.0.0-2.el7.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64
libvirt-daemon-config-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64
libvirt-client-2.0.0-10.el7.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64
libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
libvirt-daemon-2.0.0-10.el7.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64
libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64

[root@ ~]# rpm -qa | grep qemu
qemu-kvm-rhev-2.6.0-27.el7.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-img-rhev-2.6.0-27.el7.x86_64
qemu-kvm-tools-rhev-2.6.0-27.el7.x86_64
qemu-kvm-common-rhev-2.6.0-27.el7.x86_64
qemu-guest-agent-2.5.0-3.el7.x86_64

I could able to use multiple gluster server hosts.

<disk type='network' device='disk'>
      <driver name='qemu' type='raw' error_policy='stop'/>
      <source protocol='gluster' name='rep3vol/vm1.img'>
        <host name='dhcp37-86.lab.eng.blr.redhat.com' port='24007'/>
        <host name='dhcp37-146.lab.eng.blr.redhat.com' port='24007'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

where the first server ( dhcp37-86.lab.eng.blr.redhat.com ) doesn't exists at all, then the volfile server is obtained from the other host. VM could boot from the image.

What is the actual issue reported in this bug ?

Comment 3 lijuan men 2016-11-15 05:32:08 UTC

(In reply to SATHEESARAN from comment #1)
> Hi All,
> 
> I have tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-4.el7rhgs )
> with RHEL 7.3 by installing qemu-kvm-rhev from
> rhel-7-server-rhv-4-mgmt-agent-rpms
> 
> [root@ ~]# rpm -qa | grep libvirt
> libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
> libvirt-python-2.0.0-2.el7.x86_64
> libvirt-daemon-driver-network-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64
> libvirt-daemon-config-network-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64
> libvirt-client-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64
> libvirt-2.0.0-10.el7.x86_64
> libvirt-daemon-2.0.0-10.el7.x86_64
> libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64
> libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64
> 
> [root@ ~]# rpm -qa | grep qemu
> qemu-kvm-rhev-2.6.0-27.el7.x86_64
> libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64
> ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
> qemu-img-rhev-2.6.0-27.el7.x86_64
> qemu-kvm-tools-rhev-2.6.0-27.el7.x86_64
> qemu-kvm-common-rhev-2.6.0-27.el7.x86_64
> qemu-guest-agent-2.5.0-3.el7.x86_64
> 
> I could able to use multiple gluster server hosts.
> 
> <disk type='network' device='disk'>
>       <driver name='qemu' type='raw' error_policy='stop'/>
>       <source protocol='gluster' name='rep3vol/vm1.img'>
>         <host name='dhcp37-86.lab.eng.blr.redhat.com' port='24007'/>
>         <host name='dhcp37-146.lab.eng.blr.redhat.com' port='24007'/>
>       </source>
>       <target dev='vda' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
> 
> where the first server ( dhcp37-86.lab.eng.blr.redhat.com ) doesn't exists
> at all, then the volfile server is obtained from the other host. VM could
> boot from the image.
> 
> What is the actual issue reported in this bug ?


I test the scenario again.
perhaps it is related to the glusterfs version.

summary:
1.if the glusterfs server is glusterfs-server-3.8.4-5.el7rhgs.x86_64,and client host is glusterfs-3.7.9-12.el7.x86_64(it is installed in rhel7.3 release version by default),the guest in the **client host** will start failed.
2.if the glusterfs server is glusterfs-server-3.8.4-5.el7rhgs.x86_64,and client host is glusterfs-3.8.4-5.el7.x86_64,the guest will start successfully.

And I have another question.
When I use the newest glusterfs version in the server host and client host(as summary 2),the guest will start successfully.But when I destroy the guest,the guest will output some error info as followed:
[root@localhost ~]# virsh destroy bios
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to destroy domain bios
error: internal error: connection closed due to keepalive timeout

Is it normal?

Comment 4 Peter Krempa 2016-12-08 15:43:54 UTC

It indeed depends on version of libgfapi on the client. With recent enough version all servers are tried in order.

As of the keepalive timeout when destroying the VM, the problem is that libvirt executes qemuProcessStop from the event loop. That is wrong, since qemuProcessStop takes a long time. I'll file a separate bug for that.

Note You need to log in before you can comment on or make changes to this bug.