+++ This bug was initially created as a clone of Bug #821138 +++ Created attachment 583988 [details] Fuse mount log Description of problem: When you do the lookup of file/dir with no gfid, on the mount point, it does not trigger the self-heal. Version-Release number of selected component (if applicable): 3.3.0 qa41 How reproducible: Tried 3 times, was reproducible all the 3 times Steps to Reproduce: 1. Create a 1x2 rep volume and do a cifs mount. 2. In the backend of the first brick, create a file - file2 3. On the mount point, do 'ls -lh file1' Note: Behavior is same on the fuse mount also. Attached is the fuse mount log file Actual results: [root@gqac003 dis-rep_cifs]# ls -lh file2 ls: cannot access file2: No such file or directory Expected results: Lookup should trigger self-heal and complete the self-heal Additional info: [2012-05-12 16:57:49.372300] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-2: /file2: No gfid present [2012-05-12 16:57:49.372391] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file2: failed to resolve (No data available) [2012-05-12 16:57:49.408001] W [socket.c:195:__socket_rwv] 3-dis-rep-client-4: readv failed (Connection reset by peer) [2012-05-12 16:57:49.408034] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-4: reading from socket failed. Error (Connection reset by peer), peer (10.16.157.0:24026) [2012-05-12 16:57:49.408114] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-4: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 16:57:49.372676 (xid=0x144000x) [2012-05-12 16:57:49.408136] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000) [2012-05-12 16:57:49.410933] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-4: not connected (priv->connected = 0) [2012-05-12 16:57:49.410958] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-4: failed to submit rpc-request (XID: 0x144001x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-4) [2012-05-12 16:57:49.410974] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000) [2012-05-12 16:57:49.411033] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-4: disconnected [2012-05-12 16:57:49.411135] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-4: connection to 10.16.157.0:24026 failed (Connection refused) [2012-05-12 17:23:23.625315] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-1: /file3: No gfid present [2012-05-12 17:23:23.625363] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file3: failed to resolve (No data available) [2012-05-12 17:23:23.688003] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.157.0:24017) [2012-05-12 17:23:23.688134] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 17:23:23.625770 (xid=0x138714x) [2012-05-12 17:23:23.688158] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000) [2012-05-12 17:23:23.690774] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-2: not connected (priv->connected = 0) [2012-05-12 17:23:23.690797] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-2: failed to submit rpc-request (XID: 0x138715x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-2) [2012-05-12 17:23:23.690821] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000) [2012-05-12 17:23:23.690920] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-2: disconnected [2012-05-12 17:23:23.690983] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-2: connection to 10.16.157.0:24017 failed (Connection refused) --- Additional comment from ujjwala on 2012-05-12 08:20:30 EDT --- Sorry, I had mentioned diff file names on step 2 and 3. Below are the steps: Steps to Reproduce: 1. Create a 1x2 rep volume and do a cifs mount. 2. In the backend of the first brick, create a file - file2 3. On the mount point, do 'ls -lh file2' --- Additional comment from ujjwala on 2012-05-12 08:46:16 EDT --- When the file look up is done on the mount point the brick on which file was created crashes but there is no core generated. Attached is the brick log. --- Additional comment from ujjwala on 2012-05-12 08:46:55 EDT --- Created attachment 583994 [details] Brick log --- Additional comment from pkarampu on 2012-05-14 02:48:09 EDT --- The issue happens even with out afr in the picture. If the patch 27fb213be6101bca859502ac87dddc4cd0a6f272 is reverted it works fine. Assigning the bug to du.
The original bug is still in assigned status. Not sure if the patch has been reverted back as per pkarampu on 05/14. I don't know how this ended up on_qa? Moving this to assigned status.
Works on RHS master. [root@pranithk-laptop ~]# cd /mnt/r2 [root@pranithk-laptop r2]# touch /gfs/r2_0/file1 [root@pranithk-laptop r2]# ls -l file1 -rw-r--r-- 1 root root 0 Nov 9 12:07 file1 [root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/file1 getfattr: Removing leading '/' from absolute path names # file: gfs/r2_0/file1 trusted.afr.r2-client-0=0x000000000000000000000000 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x1bb6709f0af44fdd9690e013d2569e65 # file: gfs/r2_1/file1 trusted.afr.r2-client-0=0x000000000000000000000000 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x1bb6709f0af44fdd9690e013d2569e65 I am moving it to ON_QA
Verified the bug with : [12/11/12 - 10:59:27 root@king ~]# glusterfs --version glusterfs 3.3.0.5rhs built on Nov 15 2012 01:30:13 [12/11/12 - 10:56:58 root@king ~]# rpm -qa | grep gluster glusterfs-3.3.0.5rhs-38.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html