Bug 853681 - self-heal of files fails when simulated a disk replacement
Summary: self-heal of files fails when simulated a disk replacement
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: spandura
URL:
Whiteboard:
: 858496 (view as bug list)
Depends On: 830665
Blocks: 858496
TreeView+ depends on / blocked
 
Reported: 2012-09-02 06:22 UTC by Vidya Sakar
Modified: 2013-09-23 22:33 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0qa8
Doc Type: Bug Fix
Doc Text:
Clone Of: 830665
: 858496 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:33:15 UTC
Embargoed:


Attachments (Terms of Use)

Description Vidya Sakar 2012-09-02 06:22:31 UTC
+++ This bug was initially created as a clone of Bug #830665 +++

Description of problem:
-----------------------
self-heal of files fails(when simulated a disk replacement) onto replaced brick when performed "find . | xargs stat" from nfs mount. 


Version-Release number of selected component (if applicable):
------------------------------------------------------------
3.3.0qa45


How reproducible:
-----------------
Often


Steps to Reproduce:
------------------
1.Create a replicate volume(1x2. brick1, brick2) 

gluster volume create dstore replica 2 transport tcp 10.16.159.184:/export_b1/dir1 10.16.159.188:/export_b1/dir1

2.Create a nfs mount. 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

3.Create files/dirs from nfs mount
mkdir -p testdir1
dd if=/dev/urandom of=testdir1/file1 bs=1M count=1

4.Unmount nfs mount
umount /mnt/nfsc1

5.Stop the volume.
gluster volume stop dstore force

6.Stop glusterd on all nodes

7.Remove the brick "brick1" and re-create the brick "brick1" (simulate hard disk replacement)
rm -rf /export_b1/dir
mkdir -p /export_b1/dir1

8.Start glusterd on all nodes

9.Restart the volume
gluster volume start dstore
  
10.Create nfs mount 
mount -t nfs -o vers=3,noac 10.16.159.184:/dstore /mnt/nfsc1

11.On nfs mount execute "find . | xargs stat"
cd /mnt/nfsc1 ; find . | xargs stat

Actual results:
----------------
The files are not self-healed to brick1. 

[2012-06-11 03:21:18.887932] I [afr-common.c:1340:afr_launch_self_heal] 0-dstore-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>, reason: lookup detected pending operations
[2012-06-11 03:21:18.888475] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.888794] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.889243] E [afr-self-heal-metadata.c:539:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Non Blocking metadata inodelks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889278] E [afr-self-heal-metadata.c:541:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-dstore-replicate-0: Metadata self-heal failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.889298] D [afr-self-heal-metadata.c:63:afr_sh_metadata_done] 0-dstore-replicate-0: proceeding to entry check on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>
[2012-06-11 03:21:18.889780] W [client3_1-fops.c:1595:client3_1_entrylk_cbk] 0-dstore-client-0: remote operation failed: No such file or directory
[2012-06-11 03:21:18.890065] D [afr-lk-common.c:408:transaction_lk_op] 0-dstore-replicate-0: lk op is for a self heal
[2012-06-11 03:21:18.890432] E [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] 0-dstore-replicate-0: Non Blocking entrylks failed for <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>.
[2012-06-11 03:21:18.915217] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-dstore-replicate-0: background  meta-data data entry self-heal failed on <gfid:e0ffabe2-024f-46e7-b890-d86addb41f20>

Expected results:
----------------
1. The file should be self-healed to brick "brick1" . 

Additional info:
---------------
The same test case pass on fuse mount

Comment 2 Pranith Kumar K 2012-12-11 09:23:08 UTC
*** Bug 858496 has been marked as a duplicate of this bug. ***

Comment 5 spandura 2013-07-09 10:09:48 UTC
Verified the fix on build:
===========================

root@king [Jul-09-2013-15:38:24] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18


root@king [Jul-09-2013-15:38:35] >rpm -qa | grep glusterfs
glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64

Bug is fixed.

Comment 6 Scott Haines 2013-09-23 22:33:15 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.