Bug 837882

Summary: Self-heal gives many "No such file or directory" errors
Product: [Community] GlusterFS Reporter: jaw171
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: gluster-bugs, jaw171
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-26 16:50:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
self-heal.log none

Description jaw171 2012-07-05 17:06:49 UTC
Created attachment 596449 [details]
self-heal.log

Description of problem:
On a two RHEL 6.3 server GlusterFS 3.3 replica volume with four bricks per server self-heal is showing errors after one server was down for some time then brought back online.

Version-Release number of selected component (if applicable):
glusterfs-server-3.3.0-1.el6.x86_64 (from the RPMs on gluster.org)

How reproducible:


Steps to Reproduce:
1. Create a replica volume with a count of 2
2. Add files to the volume
3. Take one server offline
4. Change data in the volume
5. Bring the dead server back up
6. Watch the self-heal log
  
Actual results:
"No such file or directory" errors

Expected results:
Self-heal pushes the changed files from the healthy server to the other one without error.

Additional info:
# gluster volume info
 
Volume Name: vol_home
Type: Distributed-Replicate
Volume ID: 4147f773-f2d2-4e91-bff3-b5ec7da69a47
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: storage1.frank.sam.pitt.edu:/brick/0
Brick2: storage4.frank.sam.pitt.edu:/brick/0
Brick3: storage1.frank.sam.pitt.edu:/brick/1
Brick4: storage4.frank.sam.pitt.edu:/brick/1
Brick5: storage1.frank.sam.pitt.edu:/brick/2
Brick6: storage4.frank.sam.pitt.edu:/brick/2
Brick7: storage1.frank.sam.pitt.edu:/brick/3
Brick8: storage4.frank.sam.pitt.edu:/brick/3
Options Reconfigured:
nfs.rpc-auth-allow: 10.201.*.*,127.*
auth.allow: 10.201.*.*,127.*
performance.io-cache: off
cluster.min-free-disk: 5
performance.cache-size: 128000000
features.quota: on
nfs.disable: on
features.limit-usage: *snipped, lots of quotas*


Part of the self-heal log is attached.  When I run ' find <gluster-mount> -noleaf -print0 | xargs --null stat >/dev/null' I also see more of it in the self-heal log:


[2012-07-05 09:49:12.326243] E [afr-self-heald.c:287:_remove_stale_index] 0-vol_home-replicate-3: 087ca940-a10b-46cc-94dc-c1de2ce08d91: Failed to remove index on vol_home-client-6 - No such file or directory
[2012-07-05 09:49:12.335160] I [afr-self-heald.c:282:_remove_stale_index] 0-vol_home-replicate-3: Removing stale index for a8c87651-a428-4119-8e22-c09d13d2aaaf on vol_home-client-6
[2012-07-05 09:49:12.345011] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-vol_home-client-6: remote operation failed: No such file or directory
[2012-07-05 09:49:12.345142] E [afr-self-heald.c:287:_remove_stale_index] 0-vol_home-replicate-3: a8c87651-a428-4119-8e22-c09d13d2aaaf: Failed to remove index on vol_home-client-6 - No such file or directory
[2012-07-05 09:49:12.362025] I [afr-self-heald.c:282:_remove_stale_index] 0-vol_home-replicate-3: Removing stale index for f2c01834-7e76-465d-a1b7-04e9328f6557 on vol_home-client-6
[2012-07-05 09:49:12.362395] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-vol_home-client-6: remote operation failed: No such file or directory
[2012-07-05 09:49:12.362534] E [afr-self-heald.c:287:_remove_stale_index] 0-vol_home-replicate-3: f2c01834-7e76-465d-a1b7-04e9328f6557: Failed to remove index on vol_home-client-6 - No such file or directory
[2012-07-05 09:49:12.373466] I [afr-self-heald.c:282:_remove_stale_index] 0-vol_home-replicate-3: Removing stale index for 090cf79b-562e-4a9d-9f4a-4393c09f9a4b on vol_home-client-6
[2012-07-05 09:49:12.387391] W [client3_1-fops.c:592:client3_1_unlink_cbk] 0-vol_home-client-6: remote operation failed: No such file or directory
[2012-07-05 09:49:12.387558] E [afr-self-heald.c:287:_remove_stale_index] 0-vol_home-replicate-3: 090cf79b-562e-4a9d-9f4a-4393c09f9a4b: Failed to remove index on vol_home-client-6 - No such file or directory

Comment 1 Junaid 2012-07-18 08:37:08 UTC
Changing the component to AFR.

Comment 2 Pranith Kumar K 2012-07-26 16:50:10 UTC

*** This bug has been marked as a duplicate of bug 835423 ***