Bug 1559722

Summary:	Libvirt is stuck waiting on libgfapi
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	WenhanShi <wenshi>
Component:	libgfapi	Assignee:	Poornima G <pgurusid>
Status:	CLOSED WONTFIX	QA Contact:	Vivek Das <vdas>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	abhishku, amukherj, bkunal, dchaplyg, dm, gveitmic, jcoscia, milee, pgurusid, prasanna.kalever, qguo, rhs-bugs, sabose, sasundar, sheggodu, storage-qa-internal, vdas
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-09 12:13:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1657798

Description WenhanShi 2018-03-23 06:31:58 UTC

Description of problem:
libvirt is trying to collect some statistics from gluster api. An api call (pub_glfs_init) is stuck, this is hanging libvirt.

Version-Release number of selected component (if applicable):
RHEL 7.4
glusterfs-3.8.4-52.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.5.x86_64

How reproducible:
N/A

Steps to Reproduce:
1. N/A
2.
3.

Actual results:
Libvirt is stuck waiting on gluster api

Expected results:
Libvirt should not be stuck 

Additional info:

Comment 6 Sahina Bose 2018-04-02 09:22:40 UTC

This is a RHHI deployment from the case, and libfapi is not supported. Can you share how the customer enabled this? Can they move back to supported fuse access?

Comment 11 Poornima G 2018-04-03 03:56:48 UTC

Meanwhile, it could be because all the trusted ports are exhausted. Could you please try the following:

# gluster volume set VOLNAME server.allow-insecure on 
I see that, this option is set only on data volume, could you enable it on other volumes?

Also, edit the /etc/glusterfs/glusterd.vol in each Red Hat Gluster Storage node, and add the following setting: 

option rpc-auth-allow-insecure on

This allows gfapi clients to communicate with glusterd even with untrusted ports.

This requires glusterd restart on all the nodes, executed one after the other.

Comment 12 Poornima G 2018-04-03 03:58:12 UTC

Needinfo for comment 10

Comment 27 Dmitry Melekhov 2018-04-10 12:10:45 UTC

We just had similar problem- gluster cluster had problems due to network problems, so healing started.
And we got libvirtd stuck on all servers.

cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
#   option transport.address-family inet6
#   option base-port 49152
option rpc-auth-allow-insecure on
end-volume


Thank you!

Comment 28 Sahina Bose 2018-04-11 06:11:26 UTC

(In reply to Need Real Name from comment #27)
> We just had similar problem- gluster cluster had problems due to network
> problems, so healing started.
> And we got libvirtd stuck on all servers.
> 
> cat /etc/glusterfs/glusterd.vol
> volume management
>     type mgmt/glusterd
>     option working-directory /var/lib/glusterd
>     option transport-type socket,rdma
>     option transport.socket.keepalive-time 10
>     option transport.socket.keepalive-interval 2
>     option transport.socket.read-fail-log off
>     option ping-timeout 0
>     option event-threads 1
> #   option transport.address-family inet6
> #   option base-port 49152
> option rpc-auth-allow-insecure on
> end-volume
> 
> 
> Thank you!

Are you using gfapi to access the disks on gluster volume?

Comment 29 Dmitry Melekhov 2018-04-11 06:16:02 UTC

yes, we use it from kmv/libvirt.

Comment 30 Dmitry Melekhov 2018-04-15 12:43:32 UTC

btw, what , may be, is interesting here- we run gluster and kvm with libvirt on the same hosts, i.e. when network was down libvirt still should be able to talk to gluster, but libvirt stuck, we also run pacemaker, which runs libvirt cli quite often to check vm's state, I guess this can "helps" to libvirt stuck too.

Thank you!

Comment 34 Dmitry Melekhov 2018-10-09 11:36:45 UTC

Still have this problem...

Comment 35 Dmitry Melekhov 2018-10-10 05:15:06 UTC

btw, look like it is more likely can be triggered on 2 bricks setup with arbiter, then on 2 or 3 bricks...

Comment 37 Poornima G 2018-11-19 04:14:18 UTC

Is it reproducible on the local setup? Either on QE or GSS setup, if so can you share the setup for debugging.

Comment 44 Dmitry Melekhov 2019-04-09 10:32:10 UTC

Hello!

Looks like we don't have this problem while running gluster 4.1.

Thank you!