Bug 762658 (GLUSTER-926)

Summary: glusterfs client cannot see tree deleted/recreated by other node
Product: [Community] GlusterFS Reporter: Lei Zhang <voyager>
Component: unclassifiedAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.0.4CC: amarts, gluster-bugs, tejas, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Run this script on gluster server, I usually reprodce the problem in < 10 cycles.
none
the volfile used on our "admin" node
none
the volfile used on our "leaf" node none

Description Lei Zhang 2010-05-13 17:03:30 UTC
Created attachment 202 [details]
ltrace and strace of xtraceroute crashing (traces.tgz)

Comment 1 Lei Zhang 2010-05-13 17:04:09 UTC
Created attachment 203 [details]
hwconf

Comment 2 Lei Zhang 2010-05-13 19:39:28 UTC
Setup: 2-node cluster: one glusterfs server (with client running in the same process), the other as client. Both nodes are vmware virtual machines running centos with fuse-2.7.4-8.

What I did: on the server node, script keeps removing a tree of files then adding the same tree back through untar a tarball; on the client side, script keeps trying to access a file in the tree. Watch the server gluster log, as soon as there is a "seeking deep resolution" message, you know the client's view of the file system is out of sync with the server.

Problem: server can see the recreated tree fine, but sometimes sometimes cannot. Restarting client does not fix the problem; restarting server does.

Relevant server log messages:
[2010-05-13 12:09:08] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory
[2010-05-13 12:09:08] D [server-protocol.c:2248:server_stat_cbk] server: 336: STAT /web-apps/testphp/info/control (451964) ==> -1 (No such file or directory)
[2010-05-13 12:09:11] D [server-resolve.c:238:resolve_path_deep] brick: RESOLVE LOOKUP() seeking deep resolution of /web-apps/testphp/info
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:35] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory
[2010-05-13 12:09:35] D [server-protocol.c:2248:server_stat_cbk] server: 425: STAT /web-apps/testphp/info/control (451963) ==> -1 (No such file or directory)

Relevant client log messages:
[2010-05-13 12:09:08] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 469: STAT() /web-apps/testphp/info/control => -1 (No such file or directory)
[2010-05-13 12:09:09] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262862,451938} to {5470833542399262885,451938}
[2010-05-13 12:09:11] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262885,451938} to {5470833542399262907,450732}

Comment 3 Amar Tumballi 2010-10-05 06:01:18 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2011-04-25 08:55:32 UTC
With the introduction of 'gfid' this particular thing should be fixed. 

Marking it as fixed, please (re)open the bug if the issue persists. We didn't find the issue in our internal QA.