Bug 821138

Summary: Look up of files/Dir with no gfid, is unable to trigger self-heal
Product: [Community] GlusterFS Reporter: Ujjwala <ujjwala>
Component: fuseAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: pre-releaseCC: amarts, gluster-bugs, pkarampu, rgowdapp, sdharane
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 848347 (view as bug list) Environment:
Last Closed: 2013-07-24 17:46:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 848347    
Attachments:
Description Flags
Fuse mount log
none
Brick log none

Description Ujjwala 2012-05-12 12:17:08 UTC
Created attachment 583988 [details]
Fuse mount log

Description of problem:

When you do the lookup of file/dir with no gfid, on the mount point, it does not trigger the self-heal.

Version-Release number of selected component (if applicable):
3.3.0 qa41

How reproducible:
Tried 3 times, was reproducible all the 3 times

Steps to Reproduce:
1. Create a 1x2 rep volume and do a cifs mount.
2. In the backend of the first brick, create a file - file2
3. On the mount point, do 'ls -lh file1'
Note: Behavior is same on the fuse mount also.
Attached is the fuse mount log file
  
Actual results:
[root@gqac003 dis-rep_cifs]# ls -lh file2
ls: cannot access file2: No such file or directory


Expected results:
Lookup should trigger self-heal and complete the self-heal

Additional info:

[2012-05-12 16:57:49.372300] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-2: /file2: No gfid present
[2012-05-12 16:57:49.372391] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file2: failed to resolve (No data available)
[2012-05-12 16:57:49.408001] W [socket.c:195:__socket_rwv] 3-dis-rep-client-4: readv failed (Connection reset by peer)
[2012-05-12 16:57:49.408034] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-4: reading from socket failed. Error (Connection reset by peer), peer (10.16.157.0:24026)
[2012-05-12 16:57:49.408114] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-4: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 16:57:49.372676 (xid=0x144000x)
[2012-05-12 16:57:49.408136] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000)
[2012-05-12 16:57:49.410933] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-4: not connected (priv->connected = 0)
[2012-05-12 16:57:49.410958] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-4: failed to submit rpc-request (XID: 0x144001x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-4)
[2012-05-12 16:57:49.410974] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000)
[2012-05-12 16:57:49.411033] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-4: disconnected
[2012-05-12 16:57:49.411135] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-4: connection to 10.16.157.0:24026 failed (Connection refused)
[2012-05-12 17:23:23.625315] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-1: /file3: No gfid present
[2012-05-12 17:23:23.625363] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file3: failed to resolve (No data available)
[2012-05-12 17:23:23.688003] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.157.0:24017)
[2012-05-12 17:23:23.688134] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 17:23:23.625770 (xid=0x138714x)
[2012-05-12 17:23:23.688158] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000)
[2012-05-12 17:23:23.690774] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-2: not connected (priv->connected = 0)
[2012-05-12 17:23:23.690797] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-2: failed to submit rpc-request (XID: 0x138715x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-2)
[2012-05-12 17:23:23.690821] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000)
[2012-05-12 17:23:23.690920] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-2: disconnected
[2012-05-12 17:23:23.690983] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-2: connection to 10.16.157.0:24017 failed (Connection refused)

Comment 1 Ujjwala 2012-05-12 12:20:30 UTC
Sorry, I had mentioned diff file names on step 2 and 3. Below are the steps:

Steps to Reproduce:
1. Create a 1x2 rep volume and do a cifs mount.
2. In the backend of the first brick, create a file - file2
3. On the mount point, do 'ls -lh file2'

Comment 2 Ujjwala 2012-05-12 12:46:16 UTC
When the file look up is done on the mount point the brick on which file was created crashes but there is no core generated.
Attached is the brick log.

Comment 3 Ujjwala 2012-05-12 12:46:55 UTC
Created attachment 583994 [details]
Brick log

Comment 4 Pranith Kumar K 2012-05-14 06:48:09 UTC
The issue happens even with out afr in the picture.
If the patch 27fb213be6101bca859502ac87dddc4cd0a6f272 is reverted it works fine.
Assigning the bug to du.

Comment 5 Pranith Kumar K 2012-11-09 05:54:41 UTC
[root@pranithk-laptop ~]# cd /mnt/r2
[root@pranithk-laptop r2]# touch /gfs/r2_0/file1
[root@pranithk-laptop r2]# ls -l file1
-rw-r--r-- 1 root root 0 Nov  9 11:24 file1
[root@pranithk-laptop r2]# ls -l /gfs/r2_?
/gfs/r2_0:
total 4
-rw-r--r-- 2 root root 0 Nov  9 11:24 file1

/gfs/r2_1:
total 4
-rw-r--r-- 2 root root 0 Nov  9 11:24 file1

[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/file1
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/file1
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.gfid=0x20c07a217aeb4f32a813310b7518b63d

# file: gfs/r2_1/file1
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.gfid=0x20c07a217aeb4f32a813310b7518b63d

Test case works fine.