Description of problem: While trying to run patch [1] which does the steps mentioned in the following sections, it was observed that the arequal-checksums were different as shown below: ################################################################################ Checksum of the brick on which the data is removed ################################################################################ arequal-checksum -p /mnt/vol0/testvol_replicated_brick2 -i .glusterfs -i .landfill -i .trashcan Entry counts Regular files : 14 Directories : 3 Symbolic links : 0 Other : 0 Total : 17 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : c4a0e0fd92dba41dc446cc3b33287983 Directories : 300002e01 Symbolic links : 0 Other : 0 Total : e62cc5a1f3f39f ################################################################################ Checksum of the brick where data wan't removed ################################################################################ arequal-checksum -p /mnt/vol0/testvol_replicated_brick1 -i .glusterfs -i .landfill -i .trashcan Entry counts Regular files : 16500 Directories : 11 Symbolic links : 0 Other : 0 Total : 16511 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 6b72772e37d757ad53453c4aafed344c Directories : 301002f01 Symbolic links : 0 Other : 0 Total : 38374b67993a4ce0 ################################################################################ This mean that heal wasn't completing on the node where data was removed. However when the heal was checked before checking the checksum it was showing no entries to be healed on the bricks: ################################################################################ 2020-02-25 12:15:34,416 INFO (run) root.2.161 (cp): gluster volume heal testvol_replicated info --xml 2020-02-25 12:15:34,416 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.161 2020-02-25 12:15:34,618 INFO (_log_results) RETCODE (root.2.161): 0 2020-02-25 12:15:34,619 DEBUG (_log_results) STDOUT (root.2.161)... <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <healInfo> <bricks> <brick hostUuid="112835ce-16ed-43e1-a758-c104c78ff782"> <name>172.19.2.161:/mnt/vol0/testvol_replicated_brick0</name> <status>Connected</status> <numberOfEntries>0</numberOfEntries> </brick> <brick hostUuid="3fdae765-7a1f-4ae5-99c1-ea7b24768554"> <name>172.19.2.153:/mnt/vol0/testvol_replicated_brick1</name> <status>Connected</status> <numberOfEntries>0</numberOfEntries> </brick> <brick hostUuid="a3877a65-2963-423c-8e9f-95ceb07f907d"> <name>172.19.2.164:/mnt/vol0/testvol_replicated_brick2</name> <status>Connected</status> <numberOfEntries>0</numberOfEntries> </brick> </bricks> </healInfo> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> </cliOutput> ################################################################################ Version-Release number of selected component (if applicable): glusterfs 20200220.a0e0890 How reproducible: 2/2 Steps to Reproduce: - Create a volume of type replica or distributed-replica - Create directory on mount point and write files/dirs - Create another set of files (1K files) - While creation of files/dirs are in progress Kill one brick - Remove the contents of the killed brick(simulating disk replacement) - When the IO's are still in progress, restart glusterd on the nodes where we simulated disk replacement to bring back bricks online - Start volume heal - Wait for IO's to complete - Verify whether the files are self-healed - Calculate arequals of the mount point and all the bricks Actual results: Arequal are different for replica volumes and aren't consistent in distributed replicated volumes. Expected results: Arequals should be same in case of replicate and should be consistent in case of distributed-replicated volumes. Additional info: This issue wasn't observed in gluster 6.0 builds. Reference links: [1] https://review.gluster.org/#/c/glusto-tests/+/20378/ [2] https://ci.centos.org/job/gluster_glusto-patch-check/2053/artifact/glustomain.log
This bug is moved to https://github.com/gluster/glusterfs/issues/881, and will be tracked there from now on. Visit GitHub issues URL for further details