1683893 – [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter

Bug 1683893 - [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter

Summary: [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Shwetha K Acharya
QA Contact:	Vinayak Papnoi
Docs Contact:
URL:
Whiteboard:
Depends On:	1686568
Blocks:	1724043
TreeView+	depends on / blocked

Reported:	2019-02-28 03:51 UTC by Rochelle
Modified:	2024-10-01 16:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Consequence: In a geo-rep setup, while converting n2 master volume to n3, if worker corresponding to newly added brick (including arbiter) to replica set becomes 'Active', there is a chance that geo-rep never syncs self healed data causing data loss at slave. Cause: The worker corresponding to newly added brick to replica set should not go to 'Faulty' until it syncs the self-healed data. In any case, if it goes to 'Faulty and other worker becomes 'Active', there is a race that causes this issue. Fix/Workaround: This a known issue and there is no clean workaround for this. So n2 to n3 volume conversion at master should not be done if geo-replication is configured.
Clone Of:
Clones:	1686568 1724043 (view as bug list)
Environment:
Last Closed:	2023-12-15 10:02:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rochelle 2019-02-28 03:51:23 UTC

Description of problem:
=======================
While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch:

[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8e69e8576625d36f9ee1866c92bfb6a3
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 2fbf69488baa3ac7


[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 53c64bd1144f6d9855f0af3edb55e614
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 3901e39cb02ad487



Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch. 



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.12.2-45.el7rhgs.x86_64

How reproducible:
=================
2/2

Steps to Reproduce:
====================
1. Create and start a geo-rep session with master and slave being 2x2
2. Mount the vols and start pumping data
3. Disable and stop self healing (prior to add-brick)

# gluster volume set VOLNAME cluster.data-self-heal off
# gluster volume set VOLNAME cluster.metadata-self-heal off
# gluster volume set VOLNAME cluster.entry-self-heal off
# gluster volume set VOLNAME self-heal-daemon off

4. Add brick to the master and slave to convert them to 2x(2+1) arbiter vols
5. Start rebalance on master and slave

6. Re-enable self healing :

# gluster volume set VOLNAME cluster.data-self-heal on
# gluster volume set VOLNAME cluster.metadata-self-heal on
# gluster volume set VOLNAME cluster.entry-self-heal on
# gluster volume set VOLNAME self-heal-daemon on

7. Wait for rebalance to complete
8. Check the checksum between master and slave


Actual results:
===============
Checksum does not fully match


Expected results:
================
Checksum should match

Comment 48 hari gowtham 2019-07-23 04:57:12 UTC

Patch on the master: https://bugzilla.redhat.com/show_bug.cgi?id=1724043

Comment 89 Shwetha K Acharya 2022-02-03 07:18:14 UTC

Changing the assignee, as Tamar is working on this BZ.

Note You need to log in before you can comment on or make changes to this bug.