Bug 901723 - gnfs: E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: seen during ltp on 6.3 client.
Summary: gnfs: E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: santosh pradhan
QA Contact: Ben Turner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-18 21:44 UTC by Ben Turner
Modified: 2013-09-23 22:43 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:39:24 UTC
Embargoed:


Attachments (Terms of Use)
Sosreport from storage1 (515.31 KB, application/x-xz)
2013-01-18 21:59 UTC, Ben Turner
no flags Details
Sosreport from storage2 (489.06 KB, application/x-xz)
2013-01-18 22:00 UTC, Ben Turner
no flags Details
Sosreport from storage3(client) (444.21 KB, application/x-xz)
2013-01-18 22:00 UTC, Ben Turner
no flags Details

Description Ben Turner 2013-01-18 21:44:56 UTC
Description of problem:

During ltp tests I am seeing the following errors in the nfs.log from the node that I am mounting:

[2013-01-18 15:28:50.729072] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.742044] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f1ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.744025] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.744057] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f2ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.745529] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.745557] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f3ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.746100] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.746128] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f4ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)

This is a 6.3 client with the latest EUS kernel mounting a replicated volume over NFS:

Volume Name: REPLICATED
Type: Replicate
Volume ID: 1443a320-90fa-423b-a3e3-54715380ea64
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: storage-qe01.lab.eng.rdu2.redhat.com:/brick1
Brick2: storage-qe02.lab.eng.rdu2.redhat.com:/brick1


Version-Release number of selected component (if applicable):

glusterfs-3.3.0.5rhs-40.el6rhs.x86_64

How reproducible:

Every run.

Steps to Reproduce:
1.  Install ltp and mount replicated volume over NFS
2.  Run ltp
3.  Check nfs.log on node that is getting mounted
  
Actual results:

Error messages in logs when ltp is run.

Expected results:

No errors when test is run.

Additional info:

Will update BZ will deeper dive shortly, attached sosreports.

Comment 1 Ben Turner 2013-01-18 21:59:36 UTC
Created attachment 682764 [details]
Sosreport from storage1

Comment 2 Ben Turner 2013-01-18 22:00:11 UTC
Created attachment 682765 [details]
Sosreport from storage2

Comment 3 Ben Turner 2013-01-18 22:00:55 UTC
Created attachment 682766 [details]
Sosreport from storage3(client)

Comment 5 Ben Turner 2013-01-29 22:00:59 UTC
I tested this on ever client version from 5.6-6.4 and I saw this behavior on all client versions.  Today I reran ltp to see if I could get to the bottom of which test was causing the errors.  I haven't found which test was giving the unable to resolve FH error but I am seeing some real strange behavior with:

time $LTP_DIR/fsstress/fsstress -d /gluster-mount -l 22 -n 22 -p 22

When I run it I see warnings spam the logs:

[2013-01-29 16:40:50.485398] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000)
[2013-01-29 16:40:50.485436] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9646c2ba: /p8/d3/l9 => -1 (File name too long)
[2013-01-29 16:40:50.486733] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000)
[2013-01-29 16:40:50.486765] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9746c2ba: /p8/d3/l9 => -1 (File name too long)

I picked one example and looked at it:

[2013-01-29 16:40:50.197716] W [nfs3.c:3391:nfs3svc_remove_cbk] 0-nfs: 1f42c2ba: /run1089/p7/d3/f5 => -1 (No such file or directory)

On /gluster mount I cd to the dir:

[root@storage-qe04 d3]# pwd
/gluster-mount/run1089/p7/d3

And I try to remove the file:

[root@storage-qe04 d3]# rm f5 
rm: remove regular file `f5'? y
rm: cannot remove `f5': No such file or directory

Now I check ll and I still see the file:

[root@storage-qe04 d3]# ll
total 0
-rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5

I tried unmounting and remounting the FS and still saw the same thing:

[root@storage-qe04 gluster-mount]# cd /gluster-mount/run1089/p7/d3
[root@storage-qe04 d3]# ls
f5
[root@storage-qe04 d3]# rm f5 
rm: remove regular file `f5'? y
rm: cannot remove `f5': No such file or directory

So I went on the backend bricks and looked:

[root@storage-qe01 d3]# pwd
/brick1/run1089/p7/d3
[root@storage-qe01 d3]# ll
total 0

[root@storage-qe02 d3]# pwd
/brick1/run1089/p7/d3
[root@storage-qe02 d3]# ll
total 0

The file was not on either brick but was still showing on the client.  I went ahead and mounted from a different client:

[root@storage-qe12 ~]# mount -t nfs -o mountproto=tcp,vers=3 storage-qe01.lab.eng.rdu2.redhat.com:/DISTRIBUTED $(mkdir /test-mount; echo /test-mount)
[root@storage-qe12 ~]# cd /test-mount/run1089/p7/d3
[root@storage-qe12 d3]# ll
total 0
-rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5

The file exists even on a client that is mounting for the first time. 

I am pretty sure that the lpt testcase that causing the FH error is the same one I am running, but after executing the whole testsuite I don't see the FH error again.  I will try tomorrow just running fsstress and see if I hit the FH error.

Comment 6 santosh pradhan 2013-08-08 07:11:42 UTC
Hi Ben,

1. "Unable to resolve FH" error is addressed as part of the BZ 960835. The FIX is available in the latest RHS-2.1 build (bigbend).

2. "File name too long" message in the log is expected because the underlying file system "XFS" or "ext2/3/4" does not support file name length more than 256 chars. The tool is trying to create the symlink of 1024 chars which is rejected by symlink() syscall. Which is OK.

I could not reproduce the issue in 3.4.0.13rhs-1 build.

Could you confirm?

Thanks,
Santosh

Comment 7 Ben Turner 2013-08-12 17:01:50 UTC
Verified that the FH issue is resolved on glusterfs-3.4.0.18rhs-1.el6rhs.x86_64.

Comment 8 Vivek Agarwal 2013-08-12 17:07:31 UTC
Thanks Ben

Comment 9 Scott Haines 2013-09-23 22:39:24 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:43 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.