Bug 762658 (GLUSTER-926) - glusterfs client cannot see tree deleted/recreated by other node
Summary: glusterfs client cannot see tree deleted/recreated by other node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-926
Product: GlusterFS
Classification: Community
Component: unclassified
Version: 3.0.4
Hardware: i386
OS: Linux
low
medium
Target Milestone: ---
Assignee: Vijay Bellur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-13 19:39 UTC by Lei Zhang
Modified: 2011-04-25 11:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Run this script on gluster server, I usually reprodce the problem in < 10 cycles. (263 bytes, application/octet-stream)
2010-05-13 16:39 UTC, Lei Zhang
no flags Details
the volfile used on our "admin" node (468 bytes, application/octet-stream)
2010-05-13 17:03 UTC, Lei Zhang
no flags Details
the volfile used on our "leaf" node (1.01 KB, application/octet-stream)
2010-05-13 17:04 UTC, Lei Zhang
no flags Details

Description Lei Zhang 2010-05-13 17:03:30 UTC
Created attachment 202 [details]
ltrace and strace of xtraceroute crashing (traces.tgz)

Comment 1 Lei Zhang 2010-05-13 17:04:09 UTC
Created attachment 203 [details]
hwconf

Comment 2 Lei Zhang 2010-05-13 19:39:28 UTC
Setup: 2-node cluster: one glusterfs server (with client running in the same process), the other as client. Both nodes are vmware virtual machines running centos with fuse-2.7.4-8.

What I did: on the server node, script keeps removing a tree of files then adding the same tree back through untar a tarball; on the client side, script keeps trying to access a file in the tree. Watch the server gluster log, as soon as there is a "seeking deep resolution" message, you know the client's view of the file system is out of sync with the server.

Problem: server can see the recreated tree fine, but sometimes sometimes cannot. Restarting client does not fix the problem; restarting server does.

Relevant server log messages:
[2010-05-13 12:09:08] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory
[2010-05-13 12:09:08] D [server-protocol.c:2248:server_stat_cbk] server: 336: STAT /web-apps/testphp/info/control (451964) ==> -1 (No such file or directory)
[2010-05-13 12:09:11] D [server-resolve.c:238:resolve_path_deep] brick: RESOLVE LOOKUP() seeking deep resolution of /web-apps/testphp/info
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94
[2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac
[2010-05-13 12:09:35] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory
[2010-05-13 12:09:35] D [server-protocol.c:2248:server_stat_cbk] server: 425: STAT /web-apps/testphp/info/control (451963) ==> -1 (No such file or directory)

Relevant client log messages:
[2010-05-13 12:09:08] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 469: STAT() /web-apps/testphp/info/control => -1 (No such file or directory)
[2010-05-13 12:09:09] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262862,451938} to {5470833542399262885,451938}
[2010-05-13 12:09:11] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262885,451938} to {5470833542399262907,450732}

Comment 3 Amar Tumballi 2010-10-05 06:01:18 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2011-04-25 08:55:32 UTC
With the introduction of 'gfid' this particular thing should be fixed. 

Marking it as fixed, please (re)open the bug if the issue persists. We didn't find the issue in our internal QA.


Note You need to log in before you can comment on or make changes to this bug.