Description of problem: While running automation runs, found that gluster-NFS is not honoring the quorum count Version-Release number of selected component (if applicable): glusterfs-3.12.2-23.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 0) create 1 * 2 volume and mount using gluster-NFS protocol 1) Disable self heal daemom 2) set cluster.quorum-type to fixed. 3) start I/O( write and read )from the mount point - must succeed 4) Bring down brick1 5) start I/0 ( write and read ) - must succeed 6) set the cluster.quorum-count to 1 7) start I/0 ( write and read ) - must succeed 8) set the cluster.quorum-count to 2 9) start I/0 ( write and read ) - read must pass, write will fail 10) bring back the brick1 online 11) start I/0 ( write and read ) - must succeed 12) Bring down brick2 13) start I/0 ( write and read ) - read must pass, write will fail 14) set the cluster.quorum-count to 1 15) start I/0 ( write and read ) - must succeed Actual results: After step 15, mount is in Read-only file system Expected results: Since quorum-type is fixed and quorum count is 1, write must be successful as long as 1 brick is up and running. Additional info: [root@rhsauto055 ~]# gluster vol status Status of volume: testvol_replicated Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto055.lab.eng.blr.redhat.com:/br icks/brick0/testvol_replicated_brick0 49153 0 Y 9398 Brick rhsauto056.lab.eng.blr.redhat.com:/br icks/brick0/testvol_replicated_brick1 N/A N/A N N/A NFS Server on localhost 2049 0 Y 9377 NFS Server on rhsauto056.lab.eng.blr.redhat .com 2049 0 Y 5805 NFS Server on rhsauto057.lab.eng.blr.redhat .com 2049 0 Y 32594 NFS Server on rhsauto052.lab.eng.blr.redhat .com 2049 0 Y 26217 NFS Server on rhsauto049.lab.eng.blr.redhat .com 2049 0 Y 422 NFS Server on rhsauto053.lab.eng.blr.redhat .com 2049 0 Y 545 Task Status of Volume testvol_replicated ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsauto055 ~]# [root@rhsauto055 ~]# gluster vol info Volume Name: testvol_replicated Type: Replicate Volume ID: 6a5ba380-4bcf-428c-a68e-a4f4684ca9ba Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhsauto055.lab.eng.blr.redhat.com:/bricks/brick0/testvol_replicated_brick0 Brick2: rhsauto056.lab.eng.blr.redhat.com:/bricks/brick0/testvol_replicated_brick1 Options Reconfigured: performance.client-io-threads: off nfs.disable: off transport.address-family: inet cluster.self-heal-daemon: off cluster.quorum-type: fixed cluster.quorum-count: 1 cluster.brick-multiplex: disable cluster.server-quorum-ratio: 51 [root@rhsauto055 ~]# > Error from mount point [root@rhsauto051 testvol_replicated_nfs]# echo "TEST" >test -bash: test: Read-only file system [root@rhsauto051 testvol_replicated_nfs]# touch file touch: cannot touch ‘file’: Read-only file system [root@rhsauto051 testvol_replicated_nfs]# SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/qourum_nfs_issue/
Hi Varsha, Could you take a look at this bug? Feel free to reach out to me if you need any help.
By following the steps, step 9 does not fail. Rest other results are reproducible. 0) create 1 * 2 volume and mount using gluster-NFS protocol gluster v create testvol replica 2 127.0.0.2:/bricks/brick1 127.0.0.2:/bricks/brick2 force 1) Disable self heal daemom gluster volume set testvol self-heal-daemon off gluster v start testvol [root@localhost ~]# gluster v info Volume Name: testvol Type: Replicate Volume ID: e5f748b4-4122-42cc-bb3a-47cbf6e4c4f8 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 127.0.0.2:/bricks/brick1 Brick2: 127.0.0.2:/bricks/brick2 Options Reconfigured: performance.client-io-threads: off nfs.disable: off transport.address-family: inet cluster.self-heal-daemon: off [root@localhost ~]# mount -t nfs -o vers=3 127.0.0.2:/testvol /mnt/ [root@localhost ~]# gluster v status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 127.0.0.2:/bricks/brick1 49152 0 Y 18896 Brick 127.0.0.2:/bricks/brick2 49153 0 Y 18903 NFS Server on localhost 2049 0 Y 18968 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks 2) set cluster.quorum-type to fixed. [root@localhost ~]# gluster volume set testvol cluster.quorum-type fixed volume set: success 3) start I/O( write and read )from the mount point - must succeed [root@localhost ~]# cd /mnt/ [root@localhost mnt]# echo "TEST" >test [root@localhost mnt]# ls test [root@localhost mnt]# cd .. [root@localhost /]# ls /bricks/brick1 test [root@localhost /]# ls /bricks/brick2 test 4) Bring down brick1 [root@localhost /]# kill -15 18896 [root@localhost /]# gluster v status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 127.0.0.2:/bricks/brick1 N/A N/A N N/A Brick 127.0.0.2:/bricks/brick2 49153 0 Y 18903 NFS Server on localhost 2049 0 Y 18968 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks 5) start I/0 ( write and read ) - must succeed [root@localhost /]# cd mnt [root@localhost mnt]# echo "TEST" >test1 6) set the cluster.quorum-count to 1 [root@localhost mnt]# set the cluster.quorum-count to 1 [root@localhost mnt]# gluster volume set testvol cluster.quorum-count 1 volume set: success 7) start I/0 ( write and read ) - must succeed [root@localhost mnt]# echo "TEST" >test2 8) set the cluster.quorum-count to 2 [root@localhost mnt]# gluster volume set testvol cluster.quorum-count 2 volume set: success 9) start I/0 ( write and read ) - read must pass, write will fail (this does not fail) [root@localhost mnt]# echo "TEST" >test3 [root@localhost mnt]# cd .. [root@localhost /]# ls /mnt test test1 test2 test3 [root@localhost /]# ls /bricks/brick1 test [root@localhost /]# ls /bricks/brick2 test test1 test2 test3 10) bring back the brick1 online [root@localhost /]# service glusterd restart Restarting glusterd (via systemctl): [ OK ] [root@localhost /]# systemctl daemon-reload [root@localhost /]# gluster v status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 127.0.0.2:/bricks/brick1 49152 0 Y 19262 Brick 127.0.0.2:/bricks/brick2 49153 0 Y 18903 NFS Server on localhost 2049 0 Y 19244 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks 11) start I/0 ( write and read ) - must succeed [root@localhost /]# cd mnt [root@localhost mnt]# echo "TEST" >test4 [root@localhost mnt]# ls test test1 test2 test3 test4 12) Bring down brick2 [root@localhost mnt]# kill -15 18903 [root@localhost mnt]# gluster v status Status of volume: testvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 127.0.0.2:/bricks/brick1 49152 0 Y 19262 Brick 127.0.0.2:/bricks/brick2 N/A N/A N N/A NFS Server on localhost 2049 0 Y 19244 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks [root@localhost mnt]# ls /bricks/brick1 test test1 test2 test3 test4 [root@localhost mnt]# ls /bricks/brick2 test test1 test2 test3 test4 13) start I/0 ( write and read ) - read must pass, write will fail [root@localhost mnt]# echo "TEST" >test5 -bash: test5: Read-only file system 14) set the cluster.quorum-count to 1 [root@localhost mnt]# gluster volume set testvol cluster.quorum-count 1 volume set: success 15) start I/0 ( write and read ) - must succeed (this fails) [root@localhost mnt]# echo "TEST" >test5 -bash: test5: Read-only file system
Reverting status from Modified as there's no downstream patch posted
Patch[1] for this bug is already merged upstream. But this patch causes peer rejection issue which is resolved by another patch[2]. For backporting to downstream, refer to patch[2]. [1] https://review.gluster.org/#/c/glusterfs/+/21838/ [2] https://review.gluster.org/#/c/glusterfs/+/22297/
Doc text looks good to me Laura.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249