Bug 901723

Summary:

gnfs: E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: seen during ltp on 6.3 client.

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Ben Turner <bturner>

Component:

glusterd

Assignee:

santosh pradhan <spradhan>

Status:

CLOSED ERRATA

QA Contact:

Ben Turner <bturner>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

2.0

CC:

bturner, grajaiya, kkeithle, rhs-bugs, saujain, shaines, vagarwal, vbellur

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-09-23 22:39:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Sosreport from storage1	none
Sosreport from storage2	none
Sosreport from storage3(client)	none

Description Ben Turner 2013-01-18 21:44:56 UTC

Description of problem:

During ltp tests I am seeing the following errors in the nfs.log from the node that I am mounting:

[2013-01-18 15:28:50.729072] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.742044] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f1ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.744025] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.744057] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f2ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.745529] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.745557] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f3ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)
[2013-01-18 15:28:50.746100] E [nfs3.c:1545:nfs3_access_resume] 0-nfs-nfsv3: Unable to resolve FH: (10.8.0.15:752) REPLICATED : 5c495aeb-ffde-4b24-bbe3-e48e3c81e144
[2013-01-18 15:28:50.746128] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: f4ee3076, ACCESS: NFS: 2(No such file or directory), POSIX: 14(Bad address)

This is a 6.3 client with the latest EUS kernel mounting a replicated volume over NFS:

Volume Name: REPLICATED
Type: Replicate
Volume ID: 1443a320-90fa-423b-a3e3-54715380ea64
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: storage-qe01.lab.eng.rdu2.redhat.com:/brick1
Brick2: storage-qe02.lab.eng.rdu2.redhat.com:/brick1


Version-Release number of selected component (if applicable):

glusterfs-3.3.0.5rhs-40.el6rhs.x86_64

How reproducible:

Every run.

Steps to Reproduce:
1.  Install ltp and mount replicated volume over NFS
2.  Run ltp
3.  Check nfs.log on node that is getting mounted
  
Actual results:

Error messages in logs when ltp is run.

Expected results:

No errors when test is run.

Additional info:

Will update BZ will deeper dive shortly, attached sosreports.

Comment 1 Ben Turner 2013-01-18 21:59:36 UTC

Created attachment 682764 [details]
Sosreport from storage1

Comment 2 Ben Turner 2013-01-18 22:00:11 UTC

Created attachment 682765 [details]
Sosreport from storage2

Comment 3 Ben Turner 2013-01-18 22:00:55 UTC

Created attachment 682766 [details]
Sosreport from storage3(client)

Comment 5 Ben Turner 2013-01-29 22:00:59 UTC

I tested this on ever client version from 5.6-6.4 and I saw this behavior on all client versions.  Today I reran ltp to see if I could get to the bottom of which test was causing the errors.  I haven't found which test was giving the unable to resolve FH error but I am seeing some real strange behavior with:

time $LTP_DIR/fsstress/fsstress -d /gluster-mount -l 22 -n 22 -p 22

When I run it I see warnings spam the logs:

[2013-01-29 16:40:50.485398] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000)
[2013-01-29 16:40:50.485436] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9646c2ba: /p8/d3/l9 => -1 (File name too long)
[2013-01-29 16:40:50.486733] W [client3_1-fops.c:187:client3_1_symlink_cbk] 0-DISTRIBUTED-client-1: remote operation failed: File name too long. Path: /p8/d3/l9 (00000000-0000-0000-0000-000000000000)
[2013-01-29 16:40:50.486765] W [nfs3.c:2939:nfs3svc_symlink_cbk] 0-nfs: 9746c2ba: /p8/d3/l9 => -1 (File name too long)

I picked one example and looked at it:

[2013-01-29 16:40:50.197716] W [nfs3.c:3391:nfs3svc_remove_cbk] 0-nfs: 1f42c2ba: /run1089/p7/d3/f5 => -1 (No such file or directory)

On /gluster mount I cd to the dir:

[root@storage-qe04 d3]# pwd
/gluster-mount/run1089/p7/d3

And I try to remove the file:

[root@storage-qe04 d3]# rm f5 
rm: remove regular file `f5'? y
rm: cannot remove `f5': No such file or directory

Now I check ll and I still see the file:

[root@storage-qe04 d3]# ll
total 0
-rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5

I tried unmounting and remounting the FS and still saw the same thing:

[root@storage-qe04 gluster-mount]# cd /gluster-mount/run1089/p7/d3
[root@storage-qe04 d3]# ls
f5
[root@storage-qe04 d3]# rm f5 
rm: remove regular file `f5'? y
rm: cannot remove `f5': No such file or directory

So I went on the backend bricks and looked:

[root@storage-qe01 d3]# pwd
/brick1/run1089/p7/d3
[root@storage-qe01 d3]# ll
total 0

[root@storage-qe02 d3]# pwd
/brick1/run1089/p7/d3
[root@storage-qe02 d3]# ll
total 0

The file was not on either brick but was still showing on the client.  I went ahead and mounted from a different client:

[root@storage-qe12 ~]# mount -t nfs -o mountproto=tcp,vers=3 storage-qe01.lab.eng.rdu2.redhat.com:/DISTRIBUTED $(mkdir /test-mount; echo /test-mount)
[root@storage-qe12 ~]# cd /test-mount/run1089/p7/d3
[root@storage-qe12 d3]# ll
total 0
-rw-rw-rw-. 1 root root 579411 Jan 29 16:24 f5

The file exists even on a client that is mounting for the first time. 

I am pretty sure that the lpt testcase that causing the FH error is the same one I am running, but after executing the whole testsuite I don't see the FH error again.  I will try tomorrow just running fsstress and see if I hit the FH error.

Comment 6 santosh pradhan 2013-08-08 07:11:42 UTC

Hi Ben,

1. "Unable to resolve FH" error is addressed as part of the BZ 960835. The FIX is available in the latest RHS-2.1 build (bigbend).

2. "File name too long" message in the log is expected because the underlying file system "XFS" or "ext2/3/4" does not support file name length more than 256 chars. The tool is trying to create the symlink of 1024 chars which is rejected by symlink() syscall. Which is OK.

I could not reproduce the issue in 3.4.0.13rhs-1 build.

Could you confirm?

Thanks,
Santosh

Comment 7 Ben Turner 2013-08-12 17:01:50 UTC

Verified that the FH issue is resolved on glusterfs-3.4.0.18rhs-1.el6rhs.x86_64.

Comment 8 Vivek Agarwal 2013-08-12 17:07:31 UTC

Thanks Ben

Comment 9 Scott Haines 2013-09-23 22:39:24 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:43 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html