Bug 1683893

Summary: [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rochelle <rallan>
Component: geo-replicationAssignee: Shwetha K Acharya <sacharya>
Status: CLOSED EOL QA Contact: Vinayak Papnoi <vpapnoi>
Severity: urgent Docs Contact:
Priority: medium    
Version: rhgs-3.4CC: abhishku, bkunal, khiremat, ksubrahm, mhackett, mobisht, msaini, pasik, sacharya, sasundar, sheggodu, vdas, vpapnoi
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Consequence: In a geo-rep setup, while converting n*2 master volume to n*3, if worker corresponding to newly added brick (including arbiter) to replica set becomes 'Active', there is a chance that geo-rep never syncs self healed data causing data loss at slave. Cause: The worker corresponding to newly added brick to replica set should not go to 'Faulty' until it syncs the self-healed data. In any case, if it goes to 'Faulty and other worker becomes 'Active', there is a race that causes this issue. Fix/Workaround: This a known issue and there is no clean workaround for this. So n*2 to n*3 volume conversion at master should not be done if geo-replication is configured.
Story Points: ---
Clone Of:
: 1686568 1724043 (view as bug list) Environment:
Last Closed: 2023-12-15 10:02:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1686568    
Bug Blocks: 1724043    

Description Rochelle 2019-02-28 03:51:23 UTC
Description of problem:
=======================
While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch:

[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8e69e8576625d36f9ee1866c92bfb6a3
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 2fbf69488baa3ac7


[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 53c64bd1144f6d9855f0af3edb55e614
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 3901e39cb02ad487



Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch. 



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.12.2-45.el7rhgs.x86_64

How reproducible:
=================
2/2

Steps to Reproduce:
====================
1. Create and start a geo-rep session with master and slave being 2x2
2. Mount the vols and start pumping data
3. Disable and stop self healing (prior to add-brick)

# gluster volume set VOLNAME cluster.data-self-heal off
# gluster volume set VOLNAME cluster.metadata-self-heal off
# gluster volume set VOLNAME cluster.entry-self-heal off
# gluster volume set VOLNAME self-heal-daemon off

4. Add brick to the master and slave to convert them to 2x(2+1) arbiter vols
5. Start rebalance on master and slave

6. Re-enable self healing :

# gluster volume set VOLNAME cluster.data-self-heal on
# gluster volume set VOLNAME cluster.metadata-self-heal on
# gluster volume set VOLNAME cluster.entry-self-heal on
# gluster volume set VOLNAME self-heal-daemon on

7. Wait for rebalance to complete
8. Check the checksum between master and slave


Actual results:
===============
Checksum does not fully match


Expected results:
================
Checksum should match

Comment 48 hari gowtham 2019-07-23 04:57:12 UTC
Patch on the master: https://bugzilla.redhat.com/show_bug.cgi?id=1724043

Comment 89 Shwetha K Acharya 2022-02-03 07:18:14 UTC
Changing the assignee, as Tamar is working on this BZ.