Bug 1759875

Summary: afr: support split-brain CLI for replica 3
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Arthy Loganathan <aloganat>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.5CC: bkunal, bugs, dwalveka, nravinas, pasik, pprakash, puebele, rhs-bugs, rkothiya, sheggodu, storage-qa-internal
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: RHGS 3.5.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-38 Doc Type: Enhancement
Doc Text:
This enhancement provides CLI based split-brain resolution for replica 3 which is an advantage for the storage administrators. With this update, you can resolve split-brain via CLI for replica 3 volumes which earlier were available only for 2 replica volumes.
Story Points: ---
Clone Of: 1756938 Environment:
Last Closed: 2020-12-17 04:50:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1756938, 1760791, 1760792    
Bug Blocks:    

Description Ravishankar N 2019-10-09 10:02:57 UTC
+++ This bug was initially created as a clone of Bug #1756938 +++

Description of problem:

http://post-office.corp.redhat.com/archives/gluster-tech-list/2019-September/msg00137.html

I want to propose this for rhgs-3.5.1 as it can be really valuable for GSS to be able to use the existing CLI commands to fix corner split-brain cases even in replica 3.

--------------------------------------------------------------------------------
Ever since we added quorum checks for lookups in afr via commit
bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4, the split-brain resolution
commands would not work for replica 3 because there would be no
readables for the lookup fop.

The argument was that split-brains do not occur in replica 3 but we do
see (data/metadata) split-brain cases once in a while which indicate that  there are   a few bugs/corner cases yet to be discovered and fixed.

Fortunately, commit  8016d51a3bbd410b0b927ed66be50a09574b7982 added
GF_CLIENT_PID_GLFS_HEALD as the pid for all fops made by glfsheal. If we
leverage this and allow lookups when pid is GF_CLIENT_PID_GLFS_HEALD,
split-brain resolution commands will work for replica 3 volumes too.

Attempting a patch which does this.

--------------------------------------------------------------------------------

Comment 8 Arthy Loganathan 2020-11-05 07:12:47 UTC
[root@dhcp47-141 ~]# gluster vol info
 
Volume Name: testvol_replicated
Type: Replicate
Volume ID: 5a182a14-bf47-42c1-809a-973bf1e133cc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.141:/bricks/brick0/testvol_replicated_brick0
Brick2: 10.70.47.41:/bricks/brick0/testvol_replicated_brick1
Brick3: 10.70.47.178:/bricks/brick0/testvol_replicated_brick2
Options Reconfigured:
cluster.quorum-type: none
cluster.self-heal-daemon: off
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


========================================

[root@dhcp47-141 ~]# gluster volume heal testvol_replicated split-brain source-brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 /file1
Healed /file1.


[root@dhcp47-141 ~]# gluster volume heal testvol_replicated info split-brain
Brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0
/file2
<gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8>
<gfid:814d5f4a-1ba9-42b6-91a5-dc5c2e8a27b0>
<gfid:17dc170d-5eb3-480f-bcac-59e937317066>
Status: Connected
Number of entries in split-brain: 4

Brick 10.70.47.41:/bricks/brick0/testvol_replicated_brick1
/file2
/file4
/file5
/dir
Status: Connected
Number of entries in split-brain: 4

Brick 10.70.47.178:/bricks/brick0/testvol_replicated_brick2
/file2
<gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8>
<gfid:814d5f4a-1ba9-42b6-91a5-dc5c2e8a27b0>
<gfid:17dc170d-5eb3-480f-bcac-59e937317066>
Status: Connected
Number of entries in split-brain: 4

[root@dhcp47-141 ~]# gluster volume heal testvol_replicated split-brain source-brick 10.70.47.141:/bricks/brick0/testvol_replicated_brick0 gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8
Healed gfid:cbd90af1-ff06-4cbf-9e03-369b905c1fa8.

Verified the fix in,
glusterfs-server-6.0-46.el8rhgs.x86_64
glusterfs-server-6.0-46.el7rhgs.x86_64

Comment 12 errata-xmlrpc 2020-12-17 04:50:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603

Comment 13 Karthik U S 2020-12-18 05:14:25 UTC
*** Bug 1901154 has been marked as a duplicate of this bug. ***