Bug 1683893 - [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
Summary: [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: ---
Assignee: Shwetha K Acharya
QA Contact: Vinayak Papnoi
URL:
Whiteboard:
Depends On: 1686568
Blocks: 1724043
TreeView+ depends on / blocked
 
Reported: 2019-02-28 03:51 UTC by Rochelle
Modified: 2023-12-15 10:02 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Consequence: In a geo-rep setup, while converting n*2 master volume to n*3, if worker corresponding to newly added brick (including arbiter) to replica set becomes 'Active', there is a chance that geo-rep never syncs self healed data causing data loss at slave. Cause: The worker corresponding to newly added brick to replica set should not go to 'Faulty' until it syncs the self-healed data. In any case, if it goes to 'Faulty and other worker becomes 'Active', there is a race that causes this issue. Fix/Workaround: This a known issue and there is no clean workaround for this. So n*2 to n*3 volume conversion at master should not be done if geo-replication is configured.
Clone Of:
: 1686568 1724043 (view as bug list)
Environment:
Last Closed: 2023-12-15 10:02:28 UTC
Embargoed:


Attachments (Terms of Use)

Description Rochelle 2019-02-28 03:51:23 UTC
Description of problem:
=======================
While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch:

[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8e69e8576625d36f9ee1866c92bfb6a3
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 2fbf69488baa3ac7


[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 53c64bd1144f6d9855f0af3edb55e614
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 3901e39cb02ad487



Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch. 



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.12.2-45.el7rhgs.x86_64

How reproducible:
=================
2/2

Steps to Reproduce:
====================
1. Create and start a geo-rep session with master and slave being 2x2
2. Mount the vols and start pumping data
3. Disable and stop self healing (prior to add-brick)

# gluster volume set VOLNAME cluster.data-self-heal off
# gluster volume set VOLNAME cluster.metadata-self-heal off
# gluster volume set VOLNAME cluster.entry-self-heal off
# gluster volume set VOLNAME self-heal-daemon off

4. Add brick to the master and slave to convert them to 2x(2+1) arbiter vols
5. Start rebalance on master and slave

6. Re-enable self healing :

# gluster volume set VOLNAME cluster.data-self-heal on
# gluster volume set VOLNAME cluster.metadata-self-heal on
# gluster volume set VOLNAME cluster.entry-self-heal on
# gluster volume set VOLNAME self-heal-daemon on

7. Wait for rebalance to complete
8. Check the checksum between master and slave


Actual results:
===============
Checksum does not fully match


Expected results:
================
Checksum should match

Comment 48 hari gowtham 2019-07-23 04:57:12 UTC
Patch on the master: https://bugzilla.redhat.com/show_bug.cgi?id=1724043

Comment 89 Shwetha K Acharya 2022-02-03 07:18:14 UTC
Changing the assignee, as Tamar is working on this BZ.


Note You need to log in before you can comment on or make changes to this bug.