Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1435170

Summary:	Gluster-client no failover
Product:	[Community] GlusterFS	Reporter:	Attila Pinter <apinter.it>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.10	CC:	apinter.it, bugs, rtalur
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-09 07:47:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Attila Pinter 2017-03-23 10:20:23 UTC

Description of problem:
Running Glusterfs-server on 4 KVM virtual machines with CentOS7.3 Core , installed using CentOS Storage SIG packages v3.10 and connecting from Fedora 25 with client version 3.10 as well.

When I turn off the server I connect to originally with the client there is no failover, but the volume dismounts. 
In the log noticed name resolve errors, but the 4 server is present in the hosts file on the servers and on fedora as well.


Version-Release number of selected component (if applicable):
Installed Gluster packages on CentOS server side:

centos-release-gluster310-1.0-1.el7.centos.noarch
nfs-ganesha-gluster-2.4.3-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-ganesha-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64

Installed glusterfs packages on Fedora side:

glusterfs-3.10.0-1.fc25.x86_64
glusterfs-api-3.10.0-1.fc25.x86_64
glusterfs-libs-3.10.0-1.fc25.x86_64
glusterfs-fuse-3.10.0-1.fc25.x86_64
glusterfs-client-xlators-3.10.0-1.fc25.x86_64


How reproducible: 
100% Always on CentOS, Fedora (and Ubuntu < used this as a reference to make sure it's not my F25's issue) side clients.


Steps to Reproduce:
1. Install Glusterfs-server 3.10 (and above listed packages) from CentOS SIG
2. Add servers to /etc/hosts (on every server and client)
3. Enable unrestricted communication between server (on all server) 
firewall-cmd --permanent --add-source=192.168.0.204
firewall-cmd --permanent --add-source=192.168.0.205
firewall-cmd --permanent --add-source=192.168.0.206
firewall-cmd --permanent --add-source=192.168.0.207
4. Created trusted pool between 4 servers
5. Create volume: gluster volume create gfs replica 2 transport tcp gfs1:/tank/avalon/gfs gfs2:/tank/avalon/gfs gfs3:/tank/avalon/gfs gfs4:/tank/avalon/gfs
6. Label bricks
semanage fcontext -a -t glusterd_brick_t "/tank/avalon/gfs(/.*)?"
restorecon -Rv /tank/avalon/gfs
7. Conenct from F25 with glusterfs client:
sudo mount -t glusterfs -o backupvolfile-server=volfile_bk,transport=tcp gfs1:/gfs /mnt/bitWafl/


Actual results:
After shutting down server client is not connecting to other server
Mount log is long, available here: https://paste.fedoraproject.org/paste/OnhVTc-PcNBBDnEvIftddF5M1UNdIGYhyRLivL9gydE=

Command log from server:https://paste.fedoraproject.org/paste/bPmz63R3VozHI7XHZE9dnl5M1UNdIGYhyRLivL9gydE=

Expected results:
Client to connect other server in the pool where volume is present without downtime.

Additional info:
SELinux on server is not giving AVC messages, bricks are labelled. Had a name-bind AVC message, but after creating a pp for it and rebooting it is no longer showing.

AVC Report
========================================================
# date time comm subj syscall class permission obj event
========================================================
1. 03/22/2017 19:46:07 ? system_u:system_r:init_t:s0 0 (null) (null) (null) unset 608
2. 03/23/2017 09:40:54 glusterd system_u:system_r:glusterd_t:s0 49 tcp_socket name_bind system_u:object_r:ephemeral_port_t:s0 denied 1649
3. 03/23/2017 11:50:01 ? system_u:system_r:init_t:s0 0 (null) (null) (null) unset 2157
4. 03/23/2017 15:15:30 glusterd system_u:system_r:glusterd_t:s0 49 tcp_socket name_bind system_u:object_r:ephemeral_port_t:s0 denied 83
5. 03/23/2017 15:40:01 ? system_u:system_r:init_t:s0 0 (null) (null) (null) unset 132

Comment 1 Attila Pinter 2017-03-23 10:41:03 UTC

Ok so done a little booboo mounting on the client side.
So the correct mount is:  sudo mount -t glusterfs -o backupvolfile-server=gfs2,transport=tcp gfs1:/gfs /mnt/bitWafl/

Now this works and client connects to 2nd server@gfs2. Now the question remain: what happens if I want to mount the volume during boot? By the look of it the volfile is not being transfered properly?

Comment 2 Attila Pinter 2017-03-23 11:15:16 UTC

Interesting enough that after disabled server 2 (gfs2) and enabled all the rest and disabled gfs2 the connection still broke :/
What am I doing wrong? Or this is really a bug in 3.10?

Comment 3 Attila Pinter 2017-03-23 12:29:05 UTC

Moved forward on the issue and started to play with heal. SHD is not running, marked as not available no idea why, but got an error saying:  Not able to fetch volfile from glusterd.

Comment 4 Attila Pinter 2017-04-05 10:27:23 UTC

The original topic as there is no failover is solved by mounting the actual volfile in fstab. Mounting only from one node does transfer the volfile, but there is no failover once the server goes down. This issue I think is glusterfs related.

On the other hand regarding the SHD failing. Should I open a new ticket for that or can it run here? I think it is related to the volfile issue though.

Comment 5 Attila Pinter 2017-04-05 10:27:58 UTC

The original topic as there is no failover is solved by mounting the actual volfile in fstab. Mounting only from one node does transfer the volfile, but there is no failover once the server goes down. This issue I think is glusterfs related.

On the other hand regarding the SHD failing. Should I open a new ticket for that or can it run here? I think it is related to the volfile issue though.

Comment 6 Raghavendra Talur 2017-05-31 22:35:29 UTC

Refer to https://github.com/gluster/glusterfs/blob/master/extras/hook-scripts/start/post/S29CTDBsetup.sh#L54 for example on how to specify multiple volfile servers in fstab for boot time mounts.


failover to a different volfile server should happen if current volfile server goes down. The patch for that was merged before 3.10 release. https://review.gluster.org/#/c/13002/


Regarding SHD, it could be related to the same issue. Please share the logs as attachments once again. The previous pastebin logs have been deleted.

Comment 7 Atin Mukherjee 2017-08-09 07:47:48 UTC

Since the needinfo hasn't been addressed for more than a month now, closing this bug. Please reopen if the issue persists.