Bug 1686568 - [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
Summary: [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter
Keywords:
Status: CLOSED DUPLICATE of bug 1724043
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: hari gowtham
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1683893 1687672 1687687 1687746 1724043
TreeView+ depends on / blocked
 
Reported: 2019-03-07 17:40 UTC by Karthik U S
Modified: 2020-02-04 09:16 UTC (History)
13 users (show)

Fixed In Version:
Clone Of: 1683893
: 1687672 (view as bug list)
Environment:
Last Closed: 2020-02-04 09:16:45 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22325 0 None Merged cluster/afr: Send truncate on arbiter brick from SHD 2019-03-11 15:38:40 UTC

Comment 1 Karthik U S 2019-03-07 17:42:22 UTC
Description of problem:
=======================
While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch:

[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8e69e8576625d36f9ee1866c92bfb6a3
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 2fbf69488baa3ac7


[root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/

Entry counts
Regular files   : 10000
Directories     : 2011
Symbolic links  : 11900
Other           : 0
Total           : 23911

Metadata checksums
Regular files   : 5ce564791c
Directories     : 288ecb21ce24
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 53c64bd1144f6d9855f0af3edb55e614
Directories     : 4a596e7e1e792061
Symbolic links  : 756e690d61497f6a
Other           : 0
Total           : 3901e39cb02ad487



Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch. 



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.12.2-45.el7rhgs.x86_64

How reproducible:
=================
2/2

Steps to Reproduce:
====================
1. Create and start a geo-rep session with master and slave being 2x2
2. Mount the vols and start pumping data
3. Disable and stop self healing (prior to add-brick)

# gluster volume set VOLNAME cluster.data-self-heal off
# gluster volume set VOLNAME cluster.metadata-self-heal off
# gluster volume set VOLNAME cluster.entry-self-heal off
# gluster volume set VOLNAME self-heal-daemon off

4. Add brick to the master and slave to convert them to 2x(2+1) arbiter vols
5. Start rebalance on master and slave

6. Re-enable self healing :

# gluster volume set VOLNAME cluster.data-self-heal on
# gluster volume set VOLNAME cluster.metadata-self-heal on
# gluster volume set VOLNAME cluster.entry-self-heal on
# gluster volume set VOLNAME self-heal-daemon on

7. Wait for rebalance to complete
8. Check the checksum between master and slave


Actual results:
===============
Checksum does not fully match


Expected results:
================
Checksum should match

Comment 2 Karthik U S 2019-03-07 17:50:06 UTC
RCA:
If arbiter brick is pending data heal, then self heal will just restore the timestamps of the file and resets the pending xattrs on the source bricks. It will not send any write on the arbiter brick.
Here in the add-brick scenario, it will create the entries and then restores the timestamps and other metadata of the files from the source brick. Hence the data changes will not be marked on the changelog, leading to missing data on the slave volume after sync.

Possible Fixes:
1. Do not mark arbiter brick as ACTIVE, as it will not have the changelogs for the data transactions happened when it was down/faulty even after the completion of heal.

2. Send 1 byte write on the arbiter brick from self heal as we do with the normal writes from the clients.

Comment 3 Worker Ant 2019-03-07 17:53:48 UTC
REVIEW: https://review.gluster.org/22325 (cluster/afr: Send 1byte write on to arbiter brick from SHD) posted (#1) for review on master by Karthik U S

Comment 4 Worker Ant 2019-03-11 15:38:41 UTC
REVIEW: https://review.gluster.org/22325 (cluster/afr: Send truncate on arbiter brick from SHD) merged (#10) on master by Karthik U S

Comment 5 Sunil Kumar Acharya 2019-03-21 11:33:19 UTC
Issue is not fixed yet, moving the bug to assigned state.

Comment 6 Sunny Kumar 2020-02-04 09:16:45 UTC
Closing this bug as this bug is being addressed by BZ#1724043.

*** This bug has been marked as a duplicate of bug 1724043 ***


Note You need to log in before you can comment on or make changes to this bug.