1597821 – glusterfs client mount point fails with transport endpoint is not connected.

Bug 1597821 - glusterfs client mount point fails with transport endpoint is not connected.

Summary: glusterfs client mount point fails with transport endpoint is not connected.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	arbiter
Sub Component:
Version:	4.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-03 16:33 UTC by toma.todorov
Modified:	2018-07-04 15:29 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-07-04 15:29:29 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterfs-client log file (/var/log/glusterfs) (184.27 KB, image/jpeg) 2018-07-03 16:33 UTC, toma.todorov	no flags	Details
[IMPORTANT] Refer to this attachment for the glusterfs-client log instead. (162.48 KB, image/jpeg) 2018-07-04 08:00 UTC, toma.todorov	no flags	Details
View All

Description toma.todorov 2018-07-03 16:33:00 UTC

Created attachment 1456280 [details]
glusterfs-client log file (/var/log/glusterfs)

Description of problem:
Assume a basic replica 3 arbiter 1 configuration, glusterfs-server 4.0.2 and glusterfs-client 4.0.2. 
glusterfs-client is installed on Ubuntu 18.04.

The volume is created by the following command 

gluster volume create brick01 replica 3 arbiter 1 
proxmoxVE-1:/mnt/gluster/bricks/brick01 
proxmoxVE-2:/mnt/gluster/bricks/brick01 
arbiter01:/mnt/gluster/bricks/brick01

Gluster volume info:

Volume Name: brick01
Type: Replicate
Volume ID: 2310c6f4-f83d-4691-97a7-cbebc01b3cf7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: proxmoxVE-1:/mnt/gluster/bricks/brick01
Brick2: proxmoxVE-2:/mnt/gluster/bricks/brick01
Brick3: arbiter01:/mnt/gluster/bricks/brick01 (arbiter)

PROBLEM: In order to verify that write / read ops are permitted when one storage node is down as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/), an unexpected result occurs. After killing gluster processes on one of the non-arbiter nodes (using pkill ^gluster*) the client mount point fails with 'Transport endpoint is not connected.' (see attachment)

Even if the following additional options are set, the same result occures:
gluster volume set brick01 cluster.quorum-reads false
gluster volume set brick01 cluster.quorum-count 1

Version-Release number of selected component (if applicable): 4.0.2


How reproducible: Always.


Steps to Reproduce:
1. Setup replica 3 arbiter 1 configuration (glusterfs-server 4.0.2) where storage nodes are Debian based (Proxmox) physical nodes and arbiter is installed on Ubuntu 18.04 VM.
2. Setup gluster client on Ubuntu 18.04 VM (glusterfs-client 4.0.2)
3. Create mount point on the client (mount -t glusterfs proxmoxVE-1:/brick01 /home/<user>/brick01).
4. On proxmoxVE-1 or proxmoxVE-2, execute 'pkill ^gluster*'.
5. Operations on client side fails with 'Transport endpoint is not connected.'.

Actual results:
Operations on client side fails with 'Transport endpoint is not connected.'.

Expected results:
Operations on client side should be allowed as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/)

Additional info: See attachment, it's the glusterfs-client log file.

Comment 1 toma.todorov 2018-07-04 08:00:48 UTC

Created attachment 1456404 [details]
[IMPORTANT] Refer to this attachment for the glusterfs-client log instead.

Comment 2 toma.todorov 2018-07-04 08:04:07 UTC

EDIT:
As stated in the docs, 'cluster.quorum-type' is set to auto for arbiter configurations, and cluster.quorum-count is ignored. Please, ignore the additional settings

gluster volume set brick01 cluster.quorum-reads false
gluster volume set brick01 cluster.quorum-count 1

as well as attachment 1456280 [details] which doesn't give clear information.
Attachment 1456404 [details] gives correct information about client side logs for volume brick01 with no re-configured options.

Comment 3 toma.todorov 2018-07-04 15:29:08 UTC

'gluster volume heal brick01 enable' resolved the issue.
This eventually added reconfigured option 'cluster.self-heal-daemon: enable' to the volume. It seems that by default arbiter brick fails to heal (sync) simultaneously when any file operation occurs.

Note You need to log in before you can comment on or make changes to this bug.