Description of problem: In GCE, I have 4 instances, each with one 10gb brick. 2 instances are in the US and the other 2 are in Asia (with the hope that it will drive up i/o latency sufficiently). The bricks make up a Replica-4 volume. Before I enable halo, I can mount the volume and r/w files. However, when I set `cluster.halo-enabled yes`, I can no longer write to the volume: [root at jcope-rhs-g2fn vol]# touch /mnt/vol/test1 touch: setting times of ‘test1’: Read-only file system. Thanks to a helpful user on the mailing list, setting these volume values solves the issue: cluster.quorum-type fixed cluster.quorum-count 2 Version-Release number of selected component (if applicable): glusterfs-client-xlators-3.12.1-2.el7.x86_64 glusterfs-libs-3.12.1-2.el7.x86_64 glusterfs-api-3.12.1-2.el7.x86_64 glusterfs-server-3.12.1-2.el7.x86_64 glusterfs-3.12.1-2.el7.x86_64 glusterfs-cli-3.12.1-2.el7.x86_64 glusterfs-fuse-3.12.1-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Setup a replica volume. 2. Enable halo (gluster volume set gv0 cluster.halo-enabled yes) 3. Write to the volume. Actual results: touch: setting times of ‘test1’: Read-only file system Expected results: File written to volume Additional info:
Hi jon, could you please share the volume info, volume status, brick logs, client logs and self heal daemon logs from all the nodes.
Created attachment 1350594 [details] Node1-glusterd.log
Here is volume info and status. I'll attach the logs individually as text. # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 24831bec-32bb-46a6-9507-6b4c8a8dd14f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: gce-node1:/data/brick/gv0 Brick2: gce-node2:/data/brick/gv0 Brick3: gce-node3:/data/brick/gv0 Brick4: gce-node4:/data/brick/gv0 Options Reconfigured: cluster.halo-enabled: yes transport.address-family: inet nfs.disable: on # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gce-node1:/data/brick/gv0 49152 0 Y 19918 Brick gce-node2:/data/brick/gv0 49152 0 Y 14428 Brick gce-node3:/data/brick/gv0 49152 0 Y 2676 Brick gce-node4:/data/brick/gv0 49152 0 Y 2518 Self-heal Daemon on localhost N/A N/A Y 19939 Self-heal Daemon on gce-node2 N/A N/A Y 14449 Self-heal Daemon on gce-node4 N/A N/A Y 2539 Self-heal Daemon on gce-node3 N/A N/A Y 2697 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks
Created attachment 1350597 [details] Node2-glusterd.log
Created attachment 1350598 [details] Node3-glusterd.log
Created attachment 1350599 [details] Node4-glusterd.log
Created attachment 1350600 [details] Node1-glustershd.log
Created attachment 1350601 [details] Node2-glustershd.log
Created attachment 1350602 [details] Node3-glustershd.log
Created attachment 1350604 [details] Node4-glustershd.log
Created attachment 1350605 [details] Node1-data-brick-gv0.log
Created attachment 1350606 [details] Node2-data-brick-gv0.log
Created attachment 1350607 [details] Node3-data-brick-gv0.log
Created attachment 1350608 [details] Node4-data-brick-gv0.log
To add to the previous comment, here are the relevant config values when I reproduced the bug for the attached logs: cluster.quorum-type none cluster.quorum-count (null) cluster.server-quorum-type off cluster.server-quorum-ratio 0 cluster.quorum-reads no cluster.halo-enabled yes # set by me. cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2
Bug is moved to mailline, as there has been no analysis of data presented and 3.12 has reached EOL. Request reporter (@JonCope) to attempt reproduction on later releases and refresh provided data. Request @Rafi to update any findings and/or updates here.
This bug is moved to https://github.com/gluster/glusterfs/issues/918, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days