Hide Forgot
Created attachment 202 [details] ltrace and strace of xtraceroute crashing (traces.tgz)
Created attachment 203 [details] hwconf
Setup: 2-node cluster: one glusterfs server (with client running in the same process), the other as client. Both nodes are vmware virtual machines running centos with fuse-2.7.4-8. What I did: on the server node, script keeps removing a tree of files then adding the same tree back through untar a tarball; on the client side, script keeps trying to access a file in the tree. Watch the server gluster log, as soon as there is a "seeking deep resolution" message, you know the client's view of the file system is out of sync with the server. Problem: server can see the recreated tree fine, but sometimes sometimes cannot. Restarting client does not fix the problem; restarting server does. Relevant server log messages: [2010-05-13 12:09:08] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory [2010-05-13 12:09:08] D [server-protocol.c:2248:server_stat_cbk] server: 336: STAT /web-apps/testphp/info/control (451964) ==> -1 (No such file or directory) [2010-05-13 12:09:11] D [server-resolve.c:238:resolve_path_deep] brick: RESOLVE LOOKUP() seeking deep resolution of /web-apps/testphp/info [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94 [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94 [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94 [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f7c [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261f94 [2010-05-13 12:09:11] D [dict.c:303:dict_get] dict: @this=(nil) @key=0x261fac [2010-05-13 12:09:35] E [posix.c:560:posix_stat] posix: lstat on /web-apps/testphp/info/control failed: No such file or directory [2010-05-13 12:09:35] D [server-protocol.c:2248:server_stat_cbk] server: 425: STAT /web-apps/testphp/info/control (451963) ==> -1 (No such file or directory) Relevant client log messages: [2010-05-13 12:09:08] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 469: STAT() /web-apps/testphp/info/control => -1 (No such file or directory) [2010-05-13 12:09:09] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262862,451938} to {5470833542399262885,451938} [2010-05-13 12:09:11] D [client-protocol.c:4929:client_lookup_cbk] remote: LOOKUP 450696/testphp (/web-apps/testphp): inode number changed from {5470833542399262885,451938} to {5470833542399262907,450732}
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.
With the introduction of 'gfid' this particular thing should be fixed. Marking it as fixed, please (re)open the bug if the issue persists. We didn't find the issue in our internal QA.