Description of problem: ======================== On a 2 x 2 distribute replicate volume( 4 nodes and 1 brick per node) , simulated a disk replacement on one of the brick. (Killed the brick process and removed the contents of the brick including ".glusterfs" directory) . Execute "heal full" to trigger the self-heal. After the heal is completed successfully compare the arequal-checksum between before disk replacement and after self-heal. The arequal-checksums are not matching. The number of entries on before and after self-heal are same. The mismatch is in the checksum of files. Also, bring down the source brick , check arequal checksum. The are few entries missing.
Tried reproducing this bug several times but to no avail with release-3.6 branch with the head being at commit 3867bdb496b9a34ab3db06c151e822aa9379b3e9. Here's what I did: 1. Created a 2x2 dis-rep volume on a 4-node cluster with one brick on each node, started and mounted on a different node. 2) Ran the scripts to create symlinks and hardlinks attached with BZ 1117167: [root@nestor mnt]#~/hard_link_self_heal.sh /mnt create_files_and_dirs blah_1 [root@nestor mnt]~/hard_link_self_heal.sh /mnt create_hard_links blah_1 [root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_files_and_dirs blah_2 [root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_sym_links blah_2 [root@nestor mnt]#~/sym_link_self_heal.sh /mnt add_files_from_sym_links blah_2 3) Computed arequal-checksum at this point on the mountpoint. 3) Executed `pkill gluster` on the node containing the third brick. Removed the brick directory, recreated it and set the volume-id xattr on it. 4) Started glusterd service on node 3. 5) Executed `heal full` from one of the nodes. 6) After some time, computed arequal checksum on brick-3 and brick-4, and compared the output. Result: Both checksums matched. Even the arequal-checksum on the mount is same after heal. Shwetha, Could you let me know if the steps look OK or whether there is something I need to do differently here?
Patch for fix : http://review.gluster.org/#/c/10076/ http://review.gluster.org/#/c/10448/
*** Bug 1255611 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days