+++ This bug was initially created as a clone of Bug #1756938 +++ Description of problem: http://post-office.corp.redhat.com/archives/gluster-tech-list/2019-September/msg00137.html I want to propose this for rhgs-3.5.1 as it can be really valuable for GSS to be able to use the existing CLI commands to fix corner split-brain cases even in replica 3. -------------------------------------------------------------------------------- Ever since we added quorum checks for lookups in afr via commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution commands would not work for replica 3 because there would be no readables for the lookup fop. The argument was that split-brains do not occur in replica 3 but we do see (data/metadata) split-brain cases once in a while which indicate that there are a few bugs/corner cases yet to be discovered and fixed. Fortunately, commit 8016d51a3bbd410b0b927ed66be50a09574b7982 added GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we leverage this and allow lookups when pid is GF_CLIENT_PID_GLFS_HEALD, split-brain resolution commands will work for replica 3 volumes too. Attempting a patch which does this. --------------------------------------------------------------------------------
[root@dhcp47-141 ~]# gluster vol info Volume Name: testvol_replicated Type: Replicate Volume ID: 5a182a14-bf47-42c1-809a-973bf1e133cc Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 Brick2: 10.70.47.41:/bricks/brick0/testvol_replicated_brick1 Brick3: 10.70.47.178:/bricks/brick0/testvol_replicated_brick2 Options Reconfigured: cluster.quorum-type: none cluster.self-heal-daemon: off storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off ======================================== [root@dhcp47-141 ~]# gluster volume heal testvol_replicated split-brain source-brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 /file1 Healed /file1. [root@dhcp47-141 ~]# gluster volume heal testvol_replicated info split-brain Brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 /file2 <gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8> <gfid:814d5f4a-1ba9-42b6-91a5-dc5c2e8a27b0> <gfid:17dc170d-5eb3-480f-bcac-59e937317066> Status: Connected Number of entries in split-brain: 4 Brick 10.70.47.41:/bricks/brick0/testvol_replicated_brick1 /file2 /file4 /file5 /dir Status: Connected Number of entries in split-brain: 4 Brick 10.70.47.178:/bricks/brick0/testvol_replicated_brick2 /file2 <gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8> <gfid:814d5f4a-1ba9-42b6-91a5-dc5c2e8a27b0> <gfid:17dc170d-5eb3-480f-bcac-59e937317066> Status: Connected Number of entries in split-brain: 4 [root@dhcp47-141 ~]# gluster volume heal testvol_replicated split-brain source-brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8 Healed gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8. Verified the fix in, glusterfs-server-6.0-46.el8rhgs.x86_64 glusterfs-server-6.0-46.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603
*** Bug 1901154 has been marked as a duplicate of this bug. ***