Created attachment 1456280 [details] glusterfs-client log file (/var/log/glusterfs) Description of problem: Assume a basic replica 3 arbiter 1 configuration, glusterfs-server 4.0.2 and glusterfs-client 4.0.2. glusterfs-client is installed on Ubuntu 18.04. The volume is created by the following command gluster volume create brick01 replica 3 arbiter 1 proxmoxVE-1:/mnt/gluster/bricks/brick01 proxmoxVE-2:/mnt/gluster/bricks/brick01 arbiter01:/mnt/gluster/bricks/brick01 Gluster volume info: Volume Name: brick01 Type: Replicate Volume ID: 2310c6f4-f83d-4691-97a7-cbebc01b3cf7 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: proxmoxVE-1:/mnt/gluster/bricks/brick01 Brick2: proxmoxVE-2:/mnt/gluster/bricks/brick01 Brick3: arbiter01:/mnt/gluster/bricks/brick01 (arbiter) PROBLEM: In order to verify that write / read ops are permitted when one storage node is down as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/), an unexpected result occurs. After killing gluster processes on one of the non-arbiter nodes (using pkill ^gluster*) the client mount point fails with 'Transport endpoint is not connected.' (see attachment) Even if the following additional options are set, the same result occures: gluster volume set brick01 cluster.quorum-reads false gluster volume set brick01 cluster.quorum-count 1 Version-Release number of selected component (if applicable): 4.0.2 How reproducible: Always. Steps to Reproduce: 1. Setup replica 3 arbiter 1 configuration (glusterfs-server 4.0.2) where storage nodes are Debian based (Proxmox) physical nodes and arbiter is installed on Ubuntu 18.04 VM. 2. Setup gluster client on Ubuntu 18.04 VM (glusterfs-client 4.0.2) 3. Create mount point on the client (mount -t glusterfs proxmoxVE-1:/brick01 /home/<user>/brick01). 4. On proxmoxVE-1 or proxmoxVE-2, execute 'pkill ^gluster*'. 5. Operations on client side fails with 'Transport endpoint is not connected.'. Actual results: Operations on client side fails with 'Transport endpoint is not connected.'. Expected results: Operations on client side should be allowed as stated in the docs (https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/) Additional info: See attachment, it's the glusterfs-client log file.
Created attachment 1456404 [details] [IMPORTANT] Refer to this attachment for the glusterfs-client log instead.
EDIT: As stated in the docs, 'cluster.quorum-type' is set to auto for arbiter configurations, and cluster.quorum-count is ignored. Please, ignore the additional settings gluster volume set brick01 cluster.quorum-reads false gluster volume set brick01 cluster.quorum-count 1 as well as attachment 1456280 [details] which doesn't give clear information. Attachment 1456404 [details] gives correct information about client side logs for volume brick01 with no re-configured options.
'gluster volume heal brick01 enable' resolved the issue. This eventually added reconfigured option 'cluster.self-heal-daemon: enable' to the volume. It seems that by default arbiter brick fails to heal (sync) simultaneously when any file operation occurs.