1152957 – arequal-checksum mismatch between before and after successful heal on a replaced disk

Bug 1152957 - arequal-checksum mismatch between before and after successful heal on a replaced disk

Summary: arequal-checksum mismatch between before and after successful heal on a repla...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Anuradha
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1255611 (view as bug list)
Depends On:	1117167
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-15 09:30 UTC by spandura
Modified:	2023-09-14 02:49 UTC (History)
CC List:	10 users (show)
Fixed In Version:	v3.7.4
Clone Of:	1117167
Environment:
Last Closed:	2015-09-23 10:00:18 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Vijay Bellur 2014-10-21 12:51:10 UTC

Description of problem:
========================
On a 2 x 2 distribute replicate volume( 4 nodes and 1 brick per node) , simulated a disk replacement on one of the brick. (Killed the brick process and removed the contents of the brick including ".glusterfs" directory) . 

Execute "heal full" to trigger the self-heal. After the heal is completed successfully compare the arequal-checksum between before disk replacement and after self-heal.

The arequal-checksums are not matching. The number of entries on before and after self-heal are same. The mismatch is in the checksum of files. 

Also, bring down the source brick , check arequal checksum. The are few entries missing.

Comment 2 Krutika Dhananjay 2014-11-03 05:32:53 UTC

Tried reproducing this bug several times but to no avail with release-3.6 branch with the head being at commit 3867bdb496b9a34ab3db06c151e822aa9379b3e9.

Here's what I did:

1. Created a 2x2 dis-rep volume on a 4-node cluster with one brick on each node, started and mounted on a different node.

2) Ran the scripts to create symlinks and hardlinks attached with BZ 1117167:

[root@nestor mnt]#~/hard_link_self_heal.sh /mnt create_files_and_dirs blah_1
[root@nestor mnt]~/hard_link_self_heal.sh /mnt create_hard_links blah_1

[root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_files_and_dirs blah_2
[root@nestor mnt]#~/sym_link_self_heal.sh /mnt create_sym_links blah_2
[root@nestor mnt]#~/sym_link_self_heal.sh /mnt add_files_from_sym_links blah_2

3) Computed arequal-checksum at this point on the mountpoint.

3) Executed `pkill gluster` on the node containing the third brick. Removed the brick directory, recreated it and set the volume-id xattr on it.

4) Started glusterd service on node 3.

5) Executed `heal full` from one of the nodes.

6) After some time, computed arequal checksum on brick-3 and brick-4, and compared the output.

Result: Both checksums matched. Even the arequal-checksum on the mount is same after heal.

Shwetha,

Could you let me know if the steps look OK or whether there is something I need to do differently here?

Comment 3 Anuradha 2015-09-23 10:00:18 UTC

Patch for fix :
http://review.gluster.org/#/c/10076/
http://review.gluster.org/#/c/10448/

Comment 4 Anuradha 2015-09-23 10:00:56 UTC

*** Bug 1255611 has been marked as a duplicate of this bug. ***

Comment 5 Red Hat Bugzilla 2023-09-14 02:49:06 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.