Bug 1152957

Summary: arequal-checksum mismatch between before and after successful heal on a replaced disk
Product: [Community] GlusterFS Reporter: spandura
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: atalur, bugs, kdhananj, mzywusko, nsathyan, pkarampu, ravishankar, smohan, spandura, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v3.7.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1117167 Environment:
Last Closed: 2015-09-23 10:00:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117167    
Bug Blocks:    

Comment 1 Vijay Bellur 2014-10-21 12:51:10 UTC
Description of problem:
========================
On a 2 x 2 distribute replicate volume( 4 nodes and 1 brick per node) , simulated a disk replacement on one of the brick. (Killed the brick process and removed the contents of the brick including ".glusterfs" directory) . 

Execute "heal full" to trigger the self-heal. After the heal is completed successfully compare the arequal-checksum between before disk replacement and after self-heal.

The arequal-checksums are not matching. The number of entries on before and after self-heal are same. The mismatch is in the checksum of files. 

Also, bring down the source brick , check arequal checksum. The are few entries missing.

Comment 2 Krutika Dhananjay 2014-11-03 05:32:53 UTC
Tried reproducing this bug several times but to no avail with release-3.6 branch with the head being at commit 3867bdb496b9a34ab3db06c151e822aa9379b3e9.

Here's what I did:

1. Created a 2x2 dis-rep volume on a 4-node cluster with one brick on each node, started and mounted on a different node.

2) Ran the scripts to create symlinks and hardlinks attached with BZ 1117167:

[root@nestor mnt]#~/hard_link_self_heal.sh /mnt create_files_and_dirs blah_1
[root@nestor mnt]~/hard_link_self_heal.sh /mnt create_hard_links blah_1

[root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_files_and_dirs blah_2
[root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_sym_links blah_2
[root@nestor mnt]#~/sym_link_self_heal.sh /mnt add_files_from_sym_links blah_2

3) Computed arequal-checksum at this point on the mountpoint.

3) Executed `pkill gluster` on the node containing the third brick. Removed the brick directory, recreated it and set the volume-id xattr on it.

4) Started glusterd service on node 3.

5) Executed `heal full` from one of the nodes.

6) After some time, computed arequal checksum on brick-3 and brick-4, and compared the output.

Result: Both checksums matched. Even the arequal-checksum on the mount is same after heal.

Shwetha,

Could you let me know if the steps look OK or whether there is something I need to do differently here?

Comment 4 Anuradha 2015-09-23 10:00:56 UTC
*** Bug 1255611 has been marked as a duplicate of this bug. ***

Comment 5 Red Hat Bugzilla 2023-09-14 02:49:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days