Bug 1597821

Summary:

glusterfs client mount point fails with transport endpoint is not connected.

Product:

[Community] GlusterFS

Reporter:

toma.todorov

Component:

arbiter

Assignee:

bugs <bugs>

Status:

CLOSED NOTABUG

QA Contact:

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.1

CC:

bugs

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-07-04 15:29:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
glusterfs-client log file (/var/log/glusterfs)	none
[IMPORTANT] Refer to this attachment for the glusterfs-client log instead.	none

Description toma.todorov 2018-07-03 16:33:00 UTC

Created attachment 1456280 [details]
glusterfs-client log file (/var/log/glusterfs)

Description of problem:
Assume a basic replica 3 arbiter 1 configuration, glusterfs-server 4.0.2 and glusterfs-client 4.0.2. 
glusterfs-client is installed on Ubuntu 18.04.

The volume is created by the following command 

gluster volume create brick01 replica 3 arbiter 1 
proxmoxVE-1:/mnt/gluster/bricks/brick01 
proxmoxVE-2:/mnt/gluster/bricks/brick01 
arbiter01:/mnt/gluster/bricks/brick01

Gluster volume info:

Volume Name: brick01
Type: Replicate
Volume ID: 2310c6f4-f83d-4691-97a7-cbebc01b3cf7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: proxmoxVE-1:/mnt/gluster/bricks/brick01
Brick2: proxmoxVE-2:/mnt/gluster/bricks/brick01
Brick3: arbiter01:/mnt/gluster/bricks/brick01 (arbiter)

PROBLEM: In order to verify that write / read ops are permitted when one storage node is down as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/), an unexpected result occurs. After killing gluster processes on one of the non-arbiter nodes (using pkill ^gluster*) the client mount point fails with 'Transport endpoint is not connected.' (see attachment)

Even if the following additional options are set, the same result occures:
gluster volume set brick01 cluster.quorum-reads false
gluster volume set brick01 cluster.quorum-count 1

Version-Release number of selected component (if applicable): 4.0.2


How reproducible: Always.


Steps to Reproduce:
1. Setup replica 3 arbiter 1 configuration (glusterfs-server 4.0.2) where storage nodes are Debian based (Proxmox) physical nodes and arbiter is installed on Ubuntu 18.04 VM.
2. Setup gluster client on Ubuntu 18.04 VM (glusterfs-client 4.0.2)
3. Create mount point on the client (mount -t glusterfs proxmoxVE-1:/brick01 /home/<user>/brick01).
4. On proxmoxVE-1 or proxmoxVE-2, execute 'pkill ^gluster*'.
5. Operations on client side fails with 'Transport endpoint is not connected.'.

Actual results:
Operations on client side fails with 'Transport endpoint is not connected.'.

Expected results:
Operations on client side should be allowed as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/)

Additional info: See attachment, it's the glusterfs-client log file.

Comment 1 toma.todorov 2018-07-04 08:00:48 UTC

Created attachment 1456404 [details]
[IMPORTANT] Refer to this attachment for the glusterfs-client log instead.

Comment 2 toma.todorov 2018-07-04 08:04:07 UTC

EDIT:
As stated in the docs, 'cluster.quorum-type' is set to auto for arbiter configurations, and cluster.quorum-count is ignored. Please, ignore the additional settings

gluster volume set brick01 cluster.quorum-reads false
gluster volume set brick01 cluster.quorum-count 1

as well as attachment 1456280 [details] which doesn't give clear information.
Attachment 1456404 [details] gives correct information about client side logs for volume brick01 with no re-configured options.

Comment 3 toma.todorov 2018-07-04 15:29:08 UTC

'gluster volume heal brick01 enable' resolved the issue.
This eventually added reconfigured option 'cluster.self-heal-daemon: enable' to the volume. It seems that by default arbiter brick fails to heal (sync) simultaneously when any file operation occurs.