+++ This bug was initially created as a clone of Bug #1687672 +++ Description of problem: ======================= While converting 2x2 to 2x(2+1) (arbiter), there was a checksum mismatch: [root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/master/ Entry counts Regular files : 10000 Directories : 2011 Symbolic links : 11900 Other : 0 Total : 23911 Metadata checksums Regular files : 5ce564791c Directories : 288ecb21ce24 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 8e69e8576625d36f9ee1866c92bfb6a3 Directories : 4a596e7e1e792061 Symbolic links : 756e690d61497f6a Other : 0 Total : 2fbf69488baa3ac7 [root@dhcp43-143 ~]# ./arequal-checksum -p /mnt/slave/ Entry counts Regular files : 10000 Directories : 2011 Symbolic links : 11900 Other : 0 Total : 23911 Metadata checksums Regular files : 5ce564791c Directories : 288ecb21ce24 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 53c64bd1144f6d9855f0af3edb55e614 Directories : 4a596e7e1e792061 Symbolic links : 756e690d61497f6a Other : 0 Total : 3901e39cb02ad487 Everything matches except under "CHECKSUMS", Regular files and the total are a mismatch. Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.12.2-45.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: ==================== 1. Create and start a geo-rep session with master and slave being 2x2 2. Mount the vols and start pumping data 3. Disable and stop self healing (prior to add-brick) # gluster volume set VOLNAME cluster.data-self-heal off # gluster volume set VOLNAME cluster.metadata-self-heal off # gluster volume set VOLNAME cluster.entry-self-heal off # gluster volume set VOLNAME self-heal-daemon off 4. Add brick to the master and slave to convert them to 2x(2+1) arbiter vols 5. Start rebalance on master and slave 6. Re-enable self healing : # gluster volume set VOLNAME cluster.data-self-heal on # gluster volume set VOLNAME cluster.metadata-self-heal on # gluster volume set VOLNAME cluster.entry-self-heal on # gluster volume set VOLNAME self-heal-daemon on 7. Wait for rebalance to complete 8. Check the checksum between master and slave Actual results: =============== Checksum does not fully match Expected results: ================ Checksum should match --- Additional comment from Karthik U S on 2019-03-12 06:20:01 UTC --- RCA: If arbiter brick is pending data heal, then self heal will just restore the timestamps of the file and resets the pending xattrs on the source bricks. It will not send any write on the arbiter brick. Here in the add-brick scenario, it will create the entries and then restores the timestamps and other metadata of the files from the source brick. Hence the data changes will not be marked on the changelog, leading to missing data on the slave volume after sync. Possible Fixes: 1. Do not mark arbiter brick as ACTIVE, as it will not have the changelogs for the data transactions happened when it was down/faulty even after the completion of heal. 2. Send 1 byte write on the arbiter brick from self heal as we do with the normal writes from the clients.
REVIEW: https://review.gluster.org/22338 (cluster/afr: Send truncate on arbiter brick from SHD) merged (#1) on release-5 by Karthik U S
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.5, please open a new bug report. glusterfs-5.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000119.html [2] https://www.gluster.org/pipermail/gluster-users/