Bug 830665

Summary: self-heal of files fails when simulated a disk replacement
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: replicateAssignee: Jeff Darcy <jdarcy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: high    
Version: 3.3-betaCC: gluster-bugs, jdarcy, mailbox, rfortier
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853681 (view as bug list) Environment:
Last Closed: 2013-07-24 17:57:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 853681, 858496    

Description Shwetha Panduranga 2012-06-11 07:28:39 UTC
Description of problem:
-----------------------
self-heal of files fails(when simulated a disk replacement) onto replaced brick when performed "find . | xargs stat" from nfs mount. 


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45


How reproducible:
-----------------
Often


Steps to Reproduce:
------------------
1.Create a replicate volume(1x2. brick1, brick2) 

gluster volume create dstore replica 2 transport tcp 10.16.159.184:/export_b1/dir1 10.16.159.188:/export_b1/dir1

2.Create a nfs mount. 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

3.Create files/dirs from nfs mount
mkdir -p testdir1
dd if=/dev/urandom of=testdir1/file1 bs=1M count=1

4.Unmount nfs mount
umount /mnt/nfsc1

5.Stop the volume.
gluster volume stop dstore force

6.Stop glusterd on all nodes

7.Remove the brick "brick1" and re-create the brick "brick1" (simulate hard disk replacement)
rm -rf /export_b1/dir
mkdir -p /export_b1/dir1

8.Start glusterd on all nodes

9.Restart the volume
gluster volume start dstore
  
10.Create nfs mount 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

11.On nfs mount execute "find . | xargs stat"
cd /mnt/nfsc1 ; find . | xargs stat

Actual results:
----------------
The files are not self-healed to brick1. 

[2012-06-11 03:21:18.887932] I [afr-common.c:1340:afr_launch_self_heal] 0-dstore-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>, reason: lookup detected pending operations
[2012-06-11 03:21:18.888475] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.888794] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.889243] E [afr-self-heal-metadata.c:539:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Non Blocking metadata inodelks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889278] E [afr-self-heal-metadata.c:541:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Metadata self-heal failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889298] D [afr-self-heal-metadata.c:63:afr_sh_metadata_done] 0-dstore-replicate-0: proceeding to entry check on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>
[2012-06-11 03:21:18.889780] W [client3_1-fops.c:1595:client3_1_entrylk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.890065] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.890432] E [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] 0-dstore-replicate-0: Non Blocking entrylks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.915217] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-dstore-replicate-0: background  meta-data data entry self-heal failed on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>

Expected results:
----------------
1. The file should be self-healed to brick "brick1" . 

Additional info:
---------------
The same test case pass on fuse mount

Comment 1 Jeff Darcy 2012-10-31 14:17:01 UTC
http://review.gluster.org/4067 posted for review.

Comment 2 Vijay Bellur 2012-12-04 05:56:02 UTC
CHANGE: http://review.gluster.org/4067 (nfs: do opendir for "naked" readdirp to force self-heal checks) merged in master by Anand Avati (avati)

Comment 3 Vijay Bellur 2012-12-04 22:43:27 UTC
CHANGE: http://review.gluster.org/4266 (tests/bug-830665: use the default H0) merged in master by Anand Avati (avati)