Description of problem: During ltp tests I am seeing the following errors in the nfs.log from the node that I am mounting: [2013-01-18 15:28:50.729072] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144 [2013-01-18 15:28:50.742044] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f1ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address) [2013-01-18 15:28:50.744025] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144 [2013-01-18 15:28:50.744057] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f2ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address) [2013-01-18 15:28:50.745529] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144 [2013-01-18 15:28:50.745557] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f3ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address) [2013-01-18 15:28:50.746100] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144 [2013-01-18 15:28:50.746128] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f4ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address) This is a 6.3 client with the latest EUS kernel mounting a replicated volume over NFS: Volume Name: REPLICATED Type: Replicate Volume ID: 1443a320-90fa-423b-a3e3-54715380ea64 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: storage-qe01.lab.eng.rdu2.redhat.com:/brick1 Brick2: storage-qe02.lab.eng.rdu2.redhat.com:/brick1 Version-Release number of selected component (if applicable): glusterfs-3.3.0.5rhs-40.el6rhs.x86_64 How reproducible: Every run. Steps to Reproduce: 1. Install ltp and mount replicated volume over NFS 2. Run ltp 3. Check nfs.log on node that is getting mounted Actual results: Error messages in logs when ltp is run. Expected results: No errors when test is run. Additional info: Will update BZ will deeper dive shortly, attached sosreports.
Created attachment 682764 [details] Sosreport from storage1
Created attachment 682765 [details] Sosreport from storage2
Created attachment 682766 [details] Sosreport from storage3(client)
I tested this on ever client version from 5.6-6.4 and I saw this behavior on all client versions. Today I reran ltp to see if I could get to the bottom of which test was causing the errors. I haven't found which test was giving the unable to resolve FH error but I am seeing some real strange behavior with: time $LTP_DIR/fsstress/fsstress -d /gluster-mount -l 22 -n 22 -p 22 When I run it I see warnings spam the logs: [2013-01-29 16:40:50.485398] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000) [2013-01-29 16:40:50.485436] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9646c2ba: /p8/d3/l9 => -1 (File name too long) [2013-01-29 16:40:50.486733] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000) [2013-01-29 16:40:50.486765] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9746c2ba: /p8/d3/l9 => -1 (File name too long) I picked one example and looked at it: [2013-01-29 16:40:50.197716] W [nfs3.c:3391:nfs3svc_remove_cbk] 0-nfs: 1f42c2ba: /run1089/p7/d3/f5 => -1 (No such file or directory) On /gluster mount I cd to the dir: [root@storage-qe04 d3]# pwd /gluster-mount/run1089/p7/d3 And I try to remove the file: [root@storage-qe04 d3]# rm f5 rm: remove regular file `f5'? y rm: cannot remove `f5': No such file or directory Now I check ll and I still see the file: [root@storage-qe04 d3]# ll total 0 -rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5 I tried unmounting and remounting the FS and still saw the same thing: [root@storage-qe04 gluster-mount]# cd /gluster-mount/run1089/p7/d3 [root@storage-qe04 d3]# ls f5 [root@storage-qe04 d3]# rm f5 rm: remove regular file `f5'? y rm: cannot remove `f5': No such file or directory So I went on the backend bricks and looked: [root@storage-qe01 d3]# pwd /brick1/run1089/p7/d3 [root@storage-qe01 d3]# ll total 0 [root@storage-qe02 d3]# pwd /brick1/run1089/p7/d3 [root@storage-qe02 d3]# ll total 0 The file was not on either brick but was still showing on the client. I went ahead and mounted from a different client: [root@storage-qe12 ~]# mount -t nfs -o mountproto=tcp,vers=3 storage-qe01.lab.eng.rdu2.redhat.com:/DISTRIBUTED $(mkdir /test-mount; echo /test-mount) [root@storage-qe12 ~]# cd /test-mount/run1089/p7/d3 [root@storage-qe12 d3]# ll total 0 -rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5 The file exists even on a client that is mounting for the first time. I am pretty sure that the lpt testcase that causing the FH error is the same one I am running, but after executing the whole testsuite I don't see the FH error again. I will try tomorrow just running fsstress and see if I hit the FH error.
Hi Ben, 1. "Unable to resolve FH" error is addressed as part of the BZ 960835. The FIX is available in the latest RHS-2.1 build (bigbend). 2. "File name too long" message in the log is expected because the underlying file system "XFS" or "ext2/3/4" does not support file name length more than 256 chars. The tool is trying to create the symlink of 1024 chars which is rejected by symlink() syscall. Which is OK. I could not reproduce the issue in 3.4.0.13rhs-1 build. Could you confirm? Thanks, Santosh
Verified that the FH issue is resolved on glusterfs-3.4.0.18rhs-1.el6rhs.x86_64.
Thanks Ben
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html