Bug 848347 - Look up of files/Dir with no gfid, is unable to trigger self-heal
Look up of files/Dir with no gfid, is unable to trigger self-heal
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: fuse (Show other bugs)
2.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Pranith Kumar K
spandura
:
Depends On: 821138
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-15 06:05 EDT by Vidya Sakar
Modified: 2013-09-23 18:36 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 821138
Environment:
Last Closed: 2013-09-23 18:36:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-08-15 06:05:00 EDT
+++ This bug was initially created as a clone of Bug #821138 +++

Created attachment 583988 [details]
Fuse mount log

Description of problem:

When you do the lookup of file/dir with no gfid, on the mount point, it does not trigger the self-heal.

Version-Release number of selected component (if applicable):
3.3.0 qa41

How reproducible:
Tried 3 times, was reproducible all the 3 times

Steps to Reproduce:
1. Create a 1x2 rep volume and do a cifs mount.
2. In the backend of the first brick, create a file - file2
3. On the mount point, do 'ls -lh file1'
Note: Behavior is same on the fuse mount also.
Attached is the fuse mount log file
  
Actual results:
[root@gqac003 dis-rep_cifs]# ls -lh file2
ls: cannot access file2: No such file or directory


Expected results:
Lookup should trigger self-heal and complete the self-heal

Additional info:

[2012-05-12 16:57:49.372300] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-2: /file2: No gfid present
[2012-05-12 16:57:49.372391] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file2: failed to resolve (No data available)
[2012-05-12 16:57:49.408001] W [socket.c:195:__socket_rwv] 3-dis-rep-client-4: readv failed (Connection reset by peer)
[2012-05-12 16:57:49.408034] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-4: reading from socket failed. Error (Connection reset by peer), peer (10.16.157.0:24026)
[2012-05-12 16:57:49.408114] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-4: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 16:57:49.372676 (xid=0x144000x)
[2012-05-12 16:57:49.408136] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000)
[2012-05-12 16:57:49.410933] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-4: not connected (priv->connected = 0)
[2012-05-12 16:57:49.410958] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-4: failed to submit rpc-request (XID: 0x144001x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-4)
[2012-05-12 16:57:49.410974] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-4: remote operation failed: Transport endpoint is not connected. Path: /file2 (00000000-0000-0000-0000-000000000000)
[2012-05-12 16:57:49.411033] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-4: disconnected
[2012-05-12 16:57:49.411135] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-4: connection to 10.16.157.0:24026 failed (Connection refused)
[2012-05-12 17:23:23.625315] E [afr-common.c:1859:afr_lookup_done] 3-dis-rep-replicate-1: /file3: No gfid present
[2012-05-12 17:23:23.625363] W [fuse-resolve.c:89:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000000/file3: failed to resolve (No data available)
[2012-05-12 17:23:23.688003] W [socket.c:1512:__socket_proto_state_machine] 3-dis-rep-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.157.0:24017)
[2012-05-12 17:23:23.688134] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7feee8d59c4e] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7feee8d5916d] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7feee8d58bb3]))) 3-dis-rep-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-05-12 17:23:23.625770 (xid=0x138714x)
[2012-05-12 17:23:23.688158] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000)
[2012-05-12 17:23:23.690774] I [socket.c:2315:socket_submit_request] 3-dis-rep-client-2: not connected (priv->connected = 0)
[2012-05-12 17:23:23.690797] W [rpc-clnt.c:1498:rpc_clnt_submit] 3-dis-rep-client-2: failed to submit rpc-request (XID: 0x138715x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (dis-rep-client-2)
[2012-05-12 17:23:23.690821] W [client3_1-fops.c:2629:client3_1_lookup_cbk] 3-dis-rep-client-2: remote operation failed: Transport endpoint is not connected. Path: /file3 (00000000-0000-0000-0000-000000000000)
[2012-05-12 17:23:23.690920] I [client.c:2090:client_rpc_notify] 3-dis-rep-client-2: disconnected
[2012-05-12 17:23:23.690983] E [socket.c:1715:socket_connect_finish] 3-dis-rep-client-2: connection to 10.16.157.0:24017 failed (Connection refused)

--- Additional comment from ujjwala@redhat.com on 2012-05-12 08:20:30 EDT ---

Sorry, I had mentioned diff file names on step 2 and 3. Below are the steps:

Steps to Reproduce:
1. Create a 1x2 rep volume and do a cifs mount.
2. In the backend of the first brick, create a file - file2
3. On the mount point, do 'ls -lh file2'

--- Additional comment from ujjwala@redhat.com on 2012-05-12 08:46:16 EDT ---

When the file look up is done on the mount point the brick on which file was created crashes but there is no core generated.
Attached is the brick log.

--- Additional comment from ujjwala@redhat.com on 2012-05-12 08:46:55 EDT ---

Created attachment 583994 [details]
Brick log

--- Additional comment from pkarampu@redhat.com on 2012-05-14 02:48:09 EDT ---

The issue happens even with out afr in the picture.
If the patch 27fb213be6101bca859502ac87dddc4cd0a6f272 is reverted it works fine.
Assigning the bug to du.
Comment 2 Sudhir D 2012-11-08 23:42:16 EST
The original bug is still in assigned status. Not sure if the patch has been reverted back as per pkarampu on 05/14. I don't know how this ended up on_qa? Moving this to assigned status.
Comment 3 Pranith Kumar K 2012-11-09 01:37:03 EST
Works on RHS master.
[root@pranithk-laptop ~]# cd /mnt/r2
[root@pranithk-laptop r2]# touch /gfs/r2_0/file1
[root@pranithk-laptop r2]# ls -l file1
-rw-r--r-- 1 root root 0 Nov  9 12:07 file1
[root@pranithk-laptop r2]# getfattr -d -m . -e hex /gfs/r2_?/file1
getfattr: Removing leading '/' from absolute path names
# file: gfs/r2_0/file1
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.gfid=0x1bb6709f0af44fdd9690e013d2569e65

# file: gfs/r2_1/file1
trusted.afr.r2-client-0=0x000000000000000000000000
trusted.afr.r2-client-1=0x000000000000000000000000
trusted.gfid=0x1bb6709f0af44fdd9690e013d2569e65

I am moving it to ON_QA
Comment 4 spandura 2012-12-11 00:31:27 EST
Verified the bug with :

[12/11/12 - 10:59:27 root@king ~]# glusterfs --version
glusterfs 3.3.0.5rhs built on Nov 15 2012 01:30:13


[12/11/12 - 10:56:58 root@king ~]# rpm -qa | grep gluster
glusterfs-3.3.0.5rhs-38.el6rhs.x86_64
Comment 6 Scott Haines 2013-09-23 18:36:20 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.