Bug 830665 - self-heal of files fails when simulated a disk replacement
self-heal of files fails when simulated a disk replacement
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.3-beta
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Jeff Darcy
: Triaged
Depends On:
Blocks: 853681 858496
  Show dependency treegraph
 
Reported: 2012-06-11 03:28 EDT by Shwetha Panduranga
Modified: 2013-07-24 13:57 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853681 (view as bug list)
Environment:
Last Closed: 2013-07-24 13:57:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Shwetha Panduranga 2012-06-11 03:28:39 EDT
Description of problem:
-----------------------
self-heal of files fails(when simulated a disk replacement) onto replaced brick when performed "find . | xargs stat" from nfs mount. 


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45


How reproducible:
-----------------
Often


Steps to Reproduce:
------------------
1.Create a replicate volume(1x2. brick1, brick2) 

gluster volume create dstore replica 2 transport tcp 10.16.159.184:/export_b1/dir1 10.16.159.188:/export_b1/dir1

2.Create a nfs mount. 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

3.Create files/dirs from nfs mount
mkdir -p testdir1
dd if=/dev/urandom of=testdir1/file1 bs=1M count=1

4.Unmount nfs mount
umount /mnt/nfsc1

5.Stop the volume.
gluster volume stop dstore force

6.Stop glusterd on all nodes

7.Remove the brick "brick1" and re-create the brick "brick1" (simulate hard disk replacement)
rm -rf /export_b1/dir
mkdir -p /export_b1/dir1

8.Start glusterd on all nodes

9.Restart the volume
gluster volume start dstore
  
10.Create nfs mount 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

11.On nfs mount execute "find . | xargs stat"
cd /mnt/nfsc1 ; find . | xargs stat

Actual results:
----------------
The files are not self-healed to brick1. 

[2012-06-11 03:21:18.887932] I [afr-common.c:1340:afr_launch_self_heal] 0-dstore-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>, reason: lookup detected pending operations
[2012-06-11 03:21:18.888475] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.888794] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.889243] E [afr-self-heal-metadata.c:539:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Non Blocking metadata inodelks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889278] E [afr-self-heal-metadata.c:541:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Metadata self-heal failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889298] D [afr-self-heal-metadata.c:63:afr_sh_metadata_done] 0-dstore-replicate-0: proceeding to entry check on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>
[2012-06-11 03:21:18.889780] W [client3_1-fops.c:1595:client3_1_entrylk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.890065] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.890432] E [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] 0-dstore-replicate-0: Non Blocking entrylks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.915217] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-dstore-replicate-0: background  meta-data data entry self-heal failed on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>

Expected results:
----------------
1. The file should be self-healed to brick "brick1" . 

Additional info:
---------------
The same test case pass on fuse mount
Comment 1 Jeff Darcy 2012-10-31 10:17:01 EDT
http://review.gluster.org/4067 posted for review.
Comment 2 Vijay Bellur 2012-12-04 00:56:02 EST
CHANGE: http://review.gluster.org/4067 (nfs: do opendir for "naked" readdirp to force self-heal checks) merged in master by Anand Avati (avati@redhat.com)
Comment 3 Vijay Bellur 2012-12-04 17:43:27 EST
CHANGE: http://review.gluster.org/4266 (tests/bug-830665: use the default H0) merged in master by Anand Avati (avati@redhat.com)

Note You need to log in before you can comment on or make changes to this bug.